* SCN colour device driver for Spectrum Next and compatibles
* Implements block operations and text output from QDOS to 
* Lion's 64K MODE 256, 128K MODE 32, and limited support for
* MODE 16 pixel-pairs. 65536 colours are only supported via
* MODE64, the OPEN C suffix, and INK64K 16-bit ink provision.
*
* This driver does not include provision for high-colour or
* MODE 12 (16 colours in the MODE 8 space) output, but can
* set up KS2 hardware for PLOT/DRAW or PHOTON output to those
* and CLS, SCROLL or PAN windows such in a subset of colours.
*
* Copyright 2025 Simon N Goodwin - all rights reserved.
* $VER: SCN 1.48, updated 29th April 2025 for old cores.
*
version	equ "1.48"	Needs to match the $VER line above.
* 
* Merged on Friday 28th March from 1:23 of 20th March and
* temporary change print files for character output and new
* keywords added on 26th March, as annotated while testing.
*
* Overview of features and compatibility:
*
* The SCN driver supports a large subset of the SuperBASIC
* commands and QDOS output TRAPs offered by the SCR device
* written for Sinclair by GST, then incorporated into QDOS.
* Programs that PRINT text and update blocks of pixels can
* be redirected to new more colourful Next QL screen modes
* by opening a SCN channel rather than a SCR one, from BASIC
* or any other language, including 68K machine-code. 
*
* Supports all WINDOW, CLS, SCROLL, PAN and BORDER options.
* Supports 256 INK and PAPER colours (in lieu of stipples).
* Supports two founts consistently with CON and SCR drivers.
* Offers UNDER 0 or 1, OVER 0, 1 and -1 (EOR) output options.
* Like MODE 8, CSIZEs 0,0 and 1,0 are treated as 2,0 and 3,0.
* AT and CURSOR keywords can position text at character or 
* pixel positions, but not floating-point graphics coordinates. 
* Double-height characters (CSIZE x,1) are fully supported.
*
* The fastest routines output 6x10 pixel characters, giving a
* maximum of 1100 (42 * 25) full characters visible at once, in
* OVER 0 with or without underlining or OVER -1 UNDER 0 (only).
* OVER 1 is never optimised, always using nested X and Y loops.
* Even so output is quick as pixel data is never spread between
* bytes as it must be for output to Sinclair's display modes.
* Scrolling is relatively slow even compared with original GST
* routines as the extra colours demand twice as much video RAM.
*
*
* Support for shorter, narrow or extra-wide character spacing:
*
* Truncated or spaced-out characters can be output, with less
* efficiency, if CHAR_INC or similar Toolkit commands are used
* to adjust SD.XINC and SD.YINC to non-standard values. With
* custom founts up to 85x51 barely-legible characters may be
* squeezed into a 254x255 pixel window, ZX Tasword 2 style,
* with CHAR_INC 3,5, but a pixel gap between characters is
* recommended, limiting the ransom note to 64x42 characters. 
*
* X increments for SCN channels are always measured in PIXELS
* whereas QDOS SCR and CON counts pixels in MODE 4 and half-
* pixels in MODE 8. So when converting programs from MODE 8 or
* MODE 12 to MODE 256, custom XINC values need to be halved! 
* If SD.XINC is reduced to a value between 2 and 5 only that
* number of pixels from the high-order bits of the fount are
* output. Zero and single-pixel increments are not supported.
*
* If there is no room in the window for a character of the
* size implied by SD.XINC or SD.YINC, no error is reported
* and nothing is printed. This is consistent with Minerva.
*   
* If SD.XINC is 7 or 8 (equivalent to CSIZE 3,y) additional
* low-order bits from the fount are rendered, consistent with
* Speedscreen, Minerva and most Sinclair ROMs. Since QL fount
* rows are only a byte wide, additional STRIP-colour pixels
* are output to the right of the glyph if SD.XINC exceeds 8.
*
* SD.YINC may be 4..9, the default 10 pixels, or more. If <10
* the blank first line is skipped first, then later lines of 
* the fount may be ignored to make room, e.g. only the first
* six fount lines (preceded by a one pixel-line gap) are drawn
* when SD.YINC is 6, allowing 42x42 small characters (1764 
* in total). This means that 8x8 character glyphs from other 
* systems e.g. ZX Spectrum can be output without blank lines,
* or standard QL fount 6x9 or 8x9 glyphs can be used to build
* up larger images. YINC values lower than 4 are unsupported.
*
* If YINC exceeds 10, blank lines are left between characters,
* rather than additional rows filled with the STRIP colour, as
* occurs for the first row of full-height or taller characters. 
*
*
* Limited support for 16 colour 512x256 pixel HiRes MODE 16:
*
* This driver can also be used to output double-width (256x256
* pixel) characters to the QL-specific HiRes MODE16, as long
* as colour numbers 0 to 15 are multiplied by 17 before use
* in INK, PAPER, STRIP and BORDER commands. This ensures that
* all columns of the character are in the same colour. Type 2
* (vertically-striped) stipples can also be output in MODE 16
* by passing a colour byte with data for the even pixel column
* multiplied by 16, added to the colour for the odd-X pixels;
* for instance INK 16 would use blue for the even pixels and
* black for the odd columns in MODE 16. QDOS keywords do not
* support enough bits for horizontal stripe or chequerboard
* stipple patterns in MODE 16 to be indicated. Colours should
* only be specified using a single number 0..255 in new MODEs.
* Two- or three-parameter colour values give peculiar results.
* SuperBASIC helper PROCs INK16, PAPER16, STRIP16 and BORDER16
* select solid colours for SCN output in MODE16.
*
* SCN device BLOCK, WINDOW, PAN, SCROLL and CLS operations can 
* also be applied to even-aligned *pairs* of pixels in MODE 16.
*  
* The SuperBASIC helper PROC RECOL16 replaces solid colours at
* full MODE16 resolution, generating a RECOL256 table in BASIC
* from a table of 16 replacement colour bytes, 0..15. The table
* manipulation takes longer than the recolouring operation, so
* consider caching the generated 256-byte table for quick reuse 
* if you wish to repeat the same colour remapping later.   
*
* Support for KS2/Artix FPGA graphics mode and options
*
* These require extra FPGA space so Spartan FPGA systems will
* return "not implemented" if these suffixes are used on them.
*
* Append the D suffix to the OPEN string to use double VRAM
* to print in 256 colours at HiRes 512x256 resolution. This
* requires 128K of video RAM and scrolls accordingly slowly.
* e.g. MODE32:OPEN #3,512x200a0x10d:LIST #3
*
* Append the C suffix to configure hardware for 16-bit MODE64
* colour. Use INK64K to select a colour, which will be stored
* within the SCN channel details for external code (e.g. PLOT
* and DRAW keywords derived from my DIY Toolkit series) to use.
* 
* Normally, and when suffix A is appended, output to 64K MODE16
* and MODE256 uses the "first screen" area at $4C0000. To send
* output to the second 64K screen at $4E0000, use the B suffix. 
* Toggle the visible display between first and second screens 
* with the SCN_USE command. This also expects a KS2 Artix FPGA. 
*
*
* Deliberate and intentional design limitations:
*
* No support for keyboard input, hence no line-editing. This is
* because the large and ROM-dependent EDIT subroutine cannot be
* called by programs except as part of the CON device driver.
*
* 256 (or limited 16) colour RECOL requires the new keyword
* RECOL256, as Sinclair's RECOL is limited to eight colours.
*
* No support for pixel graphics, POINT, LINE, CIRCLE, FILL etc.
* Extended versions of the DIY Toolkit PLOT and DRAW commands
* can be used to plot pixels and lines in new MODEs, but without
* floating-point offsets, scaling or TV-type aspect correction.
* This does make them quite a lot quicker, especially for short
* lines and single points. BLOCK is fully supported and always
* faster for single-pixel-width vertical or horizontal lines. 
*
* CTRL-F5 is tested only at the start of string output, not for
* every character. You will see the whole string, or none of it.
* If this becomes unwieldy, print smaller groups of characters.
*
*
* Known bugs, edge-cases and peculiarities:
* 
* Windows must be wide & tall enough for at least one character.
* XINC values are set in pixels not half-pixels as in QL MODE 8.
* Double-height characters are not expanded for a custom YINC
* of 10, which would otherwise use only the top 5 rows from the
* fount, if the XINC is 6, as they get sifted out into the fast
* path for small characters in OVER 0 or OVER -1. To work round
* this use a YINC of 11, XINC <> 6, or the unoptimised OVER 1. 
*
****************************************************************
*
* Configuration conditionals - define TESTBED for test commands
*
* TESTBED
*
* Define PENDY to postpone newlines
PENDY
*
* Screen address for new modes is in flux; 128K is allocated so
* the second screen in new modes may be at $4D0000, and there's
* nothing yet to clear any of this space, and no documented way
* to switch between screens. But MODE 12 still uses the Sinclair
* screen addresses and word-interleaved pixel data.
*
ql_scrbase  equ 131072		2024 cores, no double-buffering
nextScr	    equ $4C0000         2025 cores, new mode default
nextScr2    equ	$4E0000         Second 64K screen, Artix only
scrbase	    equ nextScr         March "N2" core 64K screen base
*
* Sinclair master chip MC.STAT output-only bits
*
bScr2       equ 7               Set when using second 32K screen
bNTSC       equ 6	        Samsung QL NSTC (192 row) select
bLoRes      equ 3               Bit set if in QL low resolution
bMode12     equ 2               Set for CST MODE 12 on Thor XVI
bBlank      equ 1               Set to blank QL display output
*
* 128K of video RAM is needed for 256 colour hiRes or 65536 
* colour loRes displays on QL Next. Spartan KS1 FPGAs cannot
* support this. Artix KS2 FPGAs can, and sprite hardware too.
* Selection of a 128K mode is ignored by a KS1 Next, NGo etc.  
*
* System variables (relative offsets on A6)
*
sv.scrst    equ      51		Set by CTRL-F5 screen output pause
sv.mcsta    equ      52         Master chip status - for MODE etc. 
sv.chbas    equ      120        Pointer to QDOS channel 0 address
sv.chtop    equ      124        Pointer to end of QDOS channel table
*
* SuperBASIC magic numbers
*
bv.chbas    equ      48         A6 offset to start of channel table
bv.chp	    equ      52         Upper limit of channel table offset
bv.rip	    equ      88         BASIC maths stack pointer A6 offset
bv.ribas    equ      92         SuperBASIC maths stack base (top)
bv.chrix    equ      $11A       Vector to allocate maths stack space 
intsize	    equ      2          SuperBASIC % datatype size in bytes
*
* QDOS TRAP keys
*
io.open	    equ      1          TRAP #2 open device
io.close    equ      2          TRAP #2 close device
*
io.sstrg    equ      7          TRAP #3 Print string
sd.extop    equ      9          TRAP #3 Extended operation
sd.scrol    equ      24         Scroll entire window
sd.scrtp    equ      25		Scroll window above cursor line
sd.scrbt    equ      26         Scroll window below cursor line
sd.pan	    equ      27		Pan window; 28 and 29 are undefined
sd.panln    equ      30		Pan current cursor line
sd.panrt    equ      31		Pan only end of current cursor line
sd.clrrt    equ      36         Clear line to right - last opt.
sd.fount    equ      37         Set new fount addresses
sd.recol    equ      38         Recolour window
sd.fill	    equ      46         SuperBASIC BLOCK command
*
forever	    equ      -1         Infinite transput timeout
*
mt.inf      equ      0          Find QDOS information
mt.respr    equ      14         Allocate resident procedure space
mt.alchp    equ      24         Allocate heap space
mt.dmode    equ      16         Set/read display mode
mt.liod     equ      32         Link device driver
mt.riod     equ      33         Release device driver
*
* QDOS error codes
*
not_yet	      equ    -19	Not (ever) implemented
overflow      equ    -18	Value too big
bad_parameter equ    -15
in_use        equ    -9         Screen address clash!
not_open      equ    -6
range_error   equ    -4
not_complete  equ    -1
no_error      equ     0
*
* Channel table offsets
*
ch.len      equ      0          Length of definition block
ch.driver   equ      4          Link address in channel definition
ch.owner    equ      8          Task owning channel
ch.rflag    equ      12         Address of CHANTAB entry for channel
ch.tag      equ      16         Channel tag word
ch.stat     equ      18         STAT and ACTION flags
ch.lench    equ      40         SuperBASIC channel table length
*
* Window coordinate information - not yet adjustable
*
sd.xmin     equ      $18        X pixel coordinate of left edge
sd.ymin     equ      $1A        Y pixel coordinate of top edge
sd.xsize    equ      $1C        Window width in pixels
sd.ysize    equ      $1E        Window height in pixels
sd.borwd    equ      $20	Border width in MODE 8 pixels
sd.xpos     equ      $22        X window coordinate of cursor left
sd.ypos     equ      $24        Y window coordinate of cursor top
*
* Window attribute information
*
sd.xinc     equ      $26        Character width in pixels (6)
sd.yinc     equ      $28        Character height in pixels (10)

sd.fount0   equ      $2A        Base address offset of fount 0
sd.fount1   equ      $2E        Offset to address of base of fount 1
sd.scrb     equ      $32        Long video RAM address (scrbase)
*
* These masks used by SCR and CON are replaced by word masks for SCN
*
sd.pmask    equ      $36        Long paper mask - MODE 4/8 only
sd.smask    equ      $3A        Long strip mask - MODE 4/8 only
sd.imask    equ      $3E        Long ink mask   - MODE 4/8 only
*
* Custom SCN replacement channel variables
*
* Colour pattern words - selected to remap fount bit pairs
* to adjacent word-aligned pixel colour byte pairs - when
* INK, PAPER and STRIP are set, these need to correspond.
* These replace GST'S SCR AND CON driver long-word masks.
*
strip0	equ	sd.pmask	
strip1  equ 	sd.pmask+1      Word for %00 fount pattern
strip2	equ	sd.pmask+2
ink0	equ 	sd.pmask+3      Word for %01 fount pattern
ink1	equ	sd.pmask+4
strip3  equ 	sd.pmask+5      Word for %10 fount pattern
ink2	equ	sd.pmask+6
ink3	equ 	sd.pmask+7      Word for %11 fount pattern
dummy   equ     sd.pmask+8      Liebensraum, used by io.sbyte
spare	equ	sd.pmask+9	Not currently used by SCN
ink64k  equ     sd.pmask+10     For 65536 colour PLOT/DRAW
*     
sd.cattr    equ      $42        Character attribute bits
*
sa.under    equ      0          Set in SD.CATTR for underline
sa.flash    equ      1          Never set, legacy from MODE 8
sa.trans    equ      2          Bit set for OVER 1, skip paper
sa.xor      equ      3          Bit set for OVER -1
sa.tall     equ      4          Double-height if implemented
sa.fat      equ      5          Set for 8 pixel wide characters 
sa.wide     equ      6          Never set, legacy from MODE 4
sa.offset   equ      7          Pixel not char grid placement
*
sd.curf     equ      $43        Cursor status flag (CON legacy)
sd.pcolr    equ      $44        Paper colour byte
sd.scolr    equ      $45        Strip colour byte
sd.icolr    equ      $46        Ink colour byte
sd.bcolr    equ      $47	Border colour
sd.nlsta    equ      $48        Pending newline flag
sd.fmod     equ      $49	Fill mode, not implemented
sd.yorg	    equ      $4A        SCALE data, not supported
*
* The next 26 SCR/CON bytes are unused, but may be needed if 
* float graphics with LINE, POINT, SCALE etc are implemented 
* later. We definitely need to keep this JS extra:
*
sd.linel    equ      $64        Line length in bytes (256)
sd.lines    equ	     $66        Max screen lines NOT in SCR/CON
sd.end      equ      $68        Same as later Sinclair ROMs
*
lineFeed    equ      10         One exceptional character code
*
* Notes on interpretation of coordinates
*
*       XPOS is in pixel (not MODE 4) units RELATIVE to XMIN
*       XSIZE is also in LINEL pixel units RELATIVE to XMIN
*	WINDOW converts from MODE 4 pixels to X byte counts
*	and checks that XMIN + XSIZE does not exceed LINEL.
*       XPOS waits past the end of the line till an attempt
*       to print there (pending newline) when it goes to 0 as
*	the next line is selected or scrolling refreshes it.
*	YPOS is relative to YMIN and never moves outside the 
*       window or within YINC of that point, so there always
*	are at least YINC character lines available below it.
*
****************************************************************
*
* Initialisation for testbed commands (not part of the driver)
*
sHeight equ	256		Screen height limit in pixels
cHeight	equ	10		Character height (max inc. gap)
sWidth	equ	256		Width in LoRes Pixels (AKA bytes)
cWidth	equ	6		Default character width (ditto)
cWider	equ	8		CSIZE 3,x width in pixels
dInk	equ	-1		Default ink 255 default paper 0
MAKEVEN	equ	$FFFFFFFE	Even mask for MOVEQ and AND
*
* QDOS utility vectors
*
mm.alchp    equ	$C0		Allocate supervisor space
mm.rechp    equ	$C2		Release supervisor space
bp.init	    equ	$110		Add extensions
ca.gtint    equ $112            Get integer word parameters
ca.gtfp     equ $114            Get floating point parameters
ca.gtstr    equ	$116		Fetch string parameters
ca.gtlin    equ $118            Get long integer parameters
io.name	    equ	$122		Decode a device name
*
* Device driver linkage data offsets; first 24 bytes unused
* but reserved for consistency with directory device drivers.
*
sv.lxint    equ	0		External interrupt link
sv.axint    equ	4		External interrupt server
sv.lpoll    equ	8		Polling list link
sv.apoll    equ	12		Polling list server
sv.lschd    equ	16		Scheduler list link
sv.aschd    equ 20		Scheduler list server
*
* The essential four pointers required by all devices
*
sv.lio	    equ 24		Link to the next driver
sv.aio	    equ	28		SCN output routine address
sv.aopen    equ	32		SCN device OPEN handler
sv.aclos    equ	36		SCN device CLOSE handler
*
* The next five are defaults for SCN device initialisation;
* these can potentially be updated for custom founts, or to
* divert SCN output to another screen address and geometry.
* These correspond to SCN_DEFL parameters 0, 1 and 2 (long
* words and 3 (two packed words, pixel width * 65536 + height).
*
sc.fount0   equ	40		Start of first fount in ROM
sc.fount1   equ	44		Start of second ROM fount
sc.base	    equ 48		Base address of screen RAM
sc.linel    equ 52		WORD: Bytes per screen line
sc.lines    equ 54		WORD: Maximum number of lines 
*
* Workspace for window dimensions returned at OPEN by IO.NAME;
* some coordinates are handled as a pair of words so they have
* long-aligned labels to make it easier to move them together.
* These values are only temporarily used during OPEN and copied
* into the channel definition before the first output call.  
*
sc.sizes    equ	56		LONG offset to XY pair
sc.xsize    equ 56		WORD alias, width in bytes
sc.ysize    equ 58		Height in pixel lines
sc.origin   equ	60		LONG, XY of top left corner
sc.xmin     equ	60		WORD alias, offset in bytes
sc.ymin	    equ	62		WORD alias, vertical offset
sc.addr     equ 64              Word modifier for 64/128K slots
sc.limit    equ 66		Size of device structure    
*
* The following routine and corresponding procedure definitions
* are only included for testing BLOCK and character output in
* user mode and should not be used in preference to the driver.
* When building code for general use, hide TESTBED beforehand.
* 
	ifd	TESTBED
chInit	lea.l	chanTab,a0
	moveq	#0,d0           Denotes Top left and PAPER 0
	move.l	d0,sd.xmin(a0)	Clear xmin and ymin
	move.l	#sWidth<<16+sHeight,sd.xsize(a0) Two at once
	move.w	d0,sd.borwd(a0)	No border
	move.l	d0,sd.xpos(a0)	Home X and Y to top left
        move.l  #cWidth<<16+cHeight,sd.xinc(a0)	 Two at once
        move.l  d0,sd.cattr(a0) No special attributes
*
* Above MOVE.L also disables cursor and clears Strip and Paper 
*
	move.w	d0,sd.nlsta(a0) No pending newline or FILL
*
	move.l	d0,strip0(a0)   Wipe strip masks and ink0
	move.b	d0,strip3(a0)   Catch the offset one
	moveq	#dInk,d1
	move.b	d1,ink0(a0)	Patch first long word set
	move.b	d1,ink1(a0)
	move.b	d1,ink2(a0)
	move.b	d1,ink3(a0)
	move.w	d1,sd.icolr(a0) Set matching ink and border  	
*
* Clone system default fount addresses from channel slot 0
*
	lea.l	sd.fount0(a0),a2 Target for three long moves
	trap	#1		Call MT.INF as D0 is still 0
	move.l	sv.chbas(a0),a0	Find channel 0 pointer
	move.l	(a0),a0		Assume channel 0 exists
*
	move.l	sd.fount0(a0),(a2)+   Default sc.fount0
	move.l	sd.fount1(a0),(a2)+   Default sc.fount1
	move.l	#scrbase,(a2)	Default base address
*
* Add the QPRINT and QBLOCK test keywords to SuperBASIC
*
	move.w	bp.init\w,a2
	lea.l	procDef,a1
	jsr	(a2)
	endc
*
* Link in the SCN device-driver
*
	moveq	#sc.limit,d1	Size of device definition
	moveq	#0,d2		Owned by resident task 0
	moveq	#mt.alchp,d0	Allocate memory
	trap	#1		A0 -> device data space
	tst.l	d0		Check A0 is valid
	bne.s	whoops
*
	lea.l	sv.aio(a0),a2	Point at the TRAP IO vectors
	lea.l	output,a1
	move.l	a1,(a2)+     	Link in SCN output service 
	lea.l	opener,a1
	move.l	a1,(a2)+        Link SCN SV.AOPEN routine
	lea.l	closer,a1 
	move.l	a1,(a2)		Link in SCN SV.ACLOS routine
*
* Last five words are workspace for OPEN sc.origin and sc.sizes 
*
* Clone system default fount addresses from channel slot 0
* into the device driver linkage block (not the test channel)
* - they may be changed later by EXTOP or similar mechanisms.
*
	move.l	a0,a2		Save device base address
	moveq	#mt.inf,d0
	trap	#1	
	move.l	sv.chbas(a0),a0	Find channel 0 pointer
	move.l	(a0),a0		Assume channel 0 exists
	lea.l	sc.fount0(a2),a1  Address for storage
	move.l	sd.fount0(a0),(a1)+		
	move.l	sd.fount1(a0),(a1)+
*
* If this is an unversioned core, use the old QL screen address
*
        move.w  8388638,d0
        bne.s   modern
*
	move.l  #ql_scrbase,(a1)+
	move.l	#sWidth<<16+sHeight,(a1) Set LINEL and LINES
*
* Check SYSVARS are not in the way

	moveq	#mt.inf,d0
	trap	#1
	cmpa.l	#163840,a0
	beq.s   badAdd	
*
        bra.s	ancient
*
modern 	move.l  #scrbase,(a1)+	Set sc.base, default sd.scrb
	move.l	#sWidth<<16+sHeight,(a1) Set LINEL and LINES
*
ancient	lea.l	sv.lio(a2),a0
	moveq	#mt.liod,d0
	trap	#1		No error report possible
**
* Add the MODEn and RECOL256 keywords to SuperBASIC
*
	move.w	bp.init\w,a2
	lea.l	modeDef,a1
	jsr	(a2)
*	
whoops	rts			Return error code in D0
*
* Return IN USE if system variables overlap the screen
* There is no way to fix this without rebooting, so the
* small memory leak is not worth tidying, especially as
* only 2024 cores force SCN to use QL video addresses.
*
badAdd	moveq	#in_use,d0
	rts
*
* Sinclair and Minerva MODE commands know nothing of Lion's
* extra colour modes, or CST's MODE 12. To allow any QDOS
* ROM to be used, handling MODE 4 and MODE 8 as usual, the
* following parameterless commands select or disable the
* new hardware as well as the Sinclair legacy. This driver
* only directly supports 256-colour output to a display set
* up with MODE256, but it can also be used to write to the
* MODE16 (16 colour hiRes) display with these restrictions:
*
* (1) Output is always double-width (CSIZE 2,0 or CSIZE 3,0)
*     i.e. character sizes are the same as in MODE256, since
*     the same code is being used to output them, plotting
*     two HiRes pixels in the space of one LowRes pixel.
* (2) 16 solid colours can be selected by passing 17 times
*     the colour number 0..15 to INK, PAPER, STRIP or BORDER.
*     Stipples of type 2 (only) can be simulated by combining
*     the colour (0..16) for even-pixel columns plus 16 * the
*     colour (0 to 240 STEP 16) for alternate (odd) columns,
*     e.g. INK 17 gives solid blue, INK 15 or 240 gives black
*     and white or white and black stripes respectively, for 
*     supporters of Newcastle United (etc).    
* (3) RECOL256 then works on pixel-pairs, using a 256-byte 
*     table like that for MODE256, which can be created from
*     a 16-entry table by nybble expansion - messy but fast!
*
* There is no support for the Thor XVI MODE 12 in SCN or the
* Sinclair CON and SCR drivers, but colours 0..7 (and usual
* stipples of those) can be printed, plotted and even FILLed
* in MODE 12 by Sinclair's MODE 8 commands. Don't use FLASH!
* DIY Toolkit PLOT and DRAW commands can be used in MODE 12
* to render points and lines in 16 solid colours, since they
* support CST's equivalent MODE 12. But for stipples you'll
* need to run CST's ARGOS rather than any QDOS ROM, as that
* allows fractional colour numbers (0.25, 0.5 and 0.75) to
* add the necessary two extra bits to Sinclair's 0..7 gamut.
*
* The NDRAW (Next) update of the DIY Toolkit pixel graphics
* commands also supports MODE 256, and Next QL MODE 16 at
* the same 256x256 resolution using only even horizontal
* coordinates, 0, 2, 4 .. 512, and INKs as described above.
* There is no support yet for scaled, offset and clipped 
* floating-point graphics coordinates in MODE16 or MODE256.
* This driver could theoretically be extended to support the
* POINT, LINE, CIRCLE, ELLIPSE and turtle-graphics commands
* by extending the provided source and integrating NDRAW...
*
*     Simon N Goodwin, simon <at> mooli.org.uk, April 2025
*
* Additional screen modes on Next QLs are enabled by two bits
* in the following byte register. As of April 2025 those bits
* are not consistently mapped for reading, screen resolution
* is readable from there but set at MC.STAT and an additional
* write-only register selects 16 colours in two circumstances. 
*
nsVideo	equ	8388622		Weighted NB bit values follow
nb64k	equ	1		Value when using 64K screens
nb128k  equ     2               Value when using 128K screen
nb64k2  equ     4               Add to select second 64K screen
nbLoRes equ     32		NEW Set=256, 0=512 pixel lines
*
* Corresponding bit numbers for write operations
*
nw64k	equ	0		Value when using 64K screens
nw128k  equ     1               Value when using 128K screen
nw64k2  equ     2               Add to select second 64K screen
nwLoRes equ     5		NEW Set=256, 0=512 pixel lines
*
* READ bits at nsVideo are confusingly inconsistent with writes
* and in this NR case also represented as bit numbers for BTST:
*
nr64k	equ	0		True for 64K modes 256 and 16?
nrLoCol	equ	1		True for MODE 12, and MODE 16?
nr128k	equ	2		True for 128K modes 32 and 64?
*                               MC.STAT bit 3 DID set X res
nrScn2  equ     3               NEW Second screen notification
nrMode  equ	4		Reads MODE 0/8 from MC.STAT?
nrLoRes equ     5               NEW way to read X resolution 
*
lion16	equ	8388626		Set BIT 0 for 16 colours
b16col  equ     0               Bit number in lion16 register
*
* The KS1 and KS2 FPGA builds are marked N1 or N2 respectively
* at the following address. Original 2024 Next QLs had no such
* marker, and interim versions (before the second release) had
* NQ (20049) or zero there. Another way to check for KS2 Next
* QL capabilities is to write 2 to nsVideo and then check if
* it's changed. If the bit is ignored, it must be KS1 hardware
* - but on KS2 this will blink the screen and might lose HDMI
* synchroonisation. If N3 or other values are implemented, any
* code that reads FPGA_V will have to be revised accordingly.    
*
FPGA_V  equ	$80001e         Next QL Hardware type word 
KS1_V   equ     20017           "N1"  
KS2_V   equ     20018           "N2"
*
modeDef	dc.w	12		Number of procedures defined
	dc.w	mode4-*
	dc.b	5,"MODE4"
	dc.w	mode8-*
	dc.b	5,"MODE8"
	dc.w	mode12-*
	dc.b	6,"MODE12"
	dc.w	mode16-*
	dc.b	6,"MODE16"
	dc.w	mode32-*
	dc.b	6,"MODE32"
	dc.w	mode64-*
	dc.b	6,"MODE64"
	dc.w	mode256-*
	dc.b	7,"MODE256"
        dc.w    newMode-*
	dc.b    8,"NEW_MODE"
	dc.w	recol256-*
	dc.b	8,"RECOL256"
	dc.w	scnDef-*
	dc.b	8,"SCN_DEFL"
        dc.w    ink64-*		Artix FPGA only
        dc.b    6,"INK64K"
	dc.w	scnUse-*	Artix FPGA only
	dc.b	7,"SCN_USE"
	dc.w	0		End of procedure table
	dc.w	3		
	dc.w	nextMode-*
	dc.b	9,"NEXT_MODE"
        dc.w    scnVer-*
        dc.b    8,"SCN_VER$"
        dc.w    scnNum-*
        dc.b    7,"SCN_NUM"
	dc.w	0		End of function table
*
* SCN_USE command for MODE12 and MODE256, Artix FPGA only
*
* TO DO (maybe) support Minerva second screen switching,
* ONLY when MT.INF shows system variables are above 160K
* and nxVideo confirms that Next QL modes are not active.  
*
scnUse	cmp.w	#KS2_V,FPGA_V
	bne	awol
*
        move.w	ca.gtint\w,a2	Read parameter
	jsr	(a2)
	bne.s	noGood		Return error code in D0
*
	subq.w	#1,d3		Only one expected
	bne	badParm
*
	move.w	0(a1,a6.l),d0
	subq	#1,d0
	beq.s	scn1
*
	subq	#1,d0
	bne.s	badParm
*
  	bset	#nw64k2,nsVideo
	bra.s	scnDone
*
badParm	moveq	#bad_parameter,d0
noGood	rts
*
scn1	bclr	#nw64k2,nsVideo
scnDone	moveq	#no_error,d0
	rts
* 
* SCN_VER$ function, configured by the version equate earlier.
*
scnVer	moveq	#6,d1
	move.w	bv.chrix\w,a2
	jsr	(a2)
*
	movea.l	bv.rip(a6),a1	Find top of current maths stack 
	subq.l	#6,a1		Four characters + length word
	move.w  #4,0(a1,a6.l)	Stack the string length
	move.l  #version,2(a1,a6.l)
	move.l	a1,bv.rip(a6)	Update maths stack pointer
	moveq	#1,d4		Signal a string result
	moveq	#no_error,d0
	rts
*
* Identify which QL or Next screen is currently displayed.
* Return 1 for first, 2 for second, 0 if not applicable.
*
scnNum  move.b  nsVideo,d0
	btst    #nr64k,d0	Are 64K Lion modes active?
        bne.s   next64
*
        btst    #nr128k,d0
	bne.s   screen0
*
        moveq   #mt.inf,d0
	trap	#1
        btst    #bScr2,sv.mcsta(a0)
	beq.s	screen1
*
screen2 moveq   #2,d4
	bra	gotD4
*        
screen0 moveq   #0,d4
	bra	gotD4
*
next64  btst    #nrScn2,d0
	bne.s	screen2
*	
screen1 moveq   #1,d4
	bra     gotD4
*	
rangErr	moveq	#range_error,d0
	rts
*
* Set a default long word value in the SCN device definition:
*
* The first parameter is the index, 0, 1, 2 or 3 of long words
* that are user-configurable for later OPEN "SCN" operations.
* The second parameter is the arbitrary 32-bit address to set.
*
*  SCN_DEFL 0,f0   Sets the default SCN fount0 address to F0.
*  SCN_DEFL 1,f1   Sets the default SCN fount1 address to F1.
*  SCN_DEFL 2,addr Sets the start address of SCN screen RAM.
*  SCN_DEFL 3,xy   Sets the screen width and height in pixels
*                  where XY = pixels per row * 65536 + rows.
*
* Default ADDR is $20000 for the 2024 Next QL (in the Sinclair
* and Minerva screen address space) and $4C0000 since, the 64K
* area defined by Aurora, or $4E0000 for a second 64K screen.
* Default MODE 256 or 16 XY is $01000100 (256 bytes 256 rows).
* These values are configurable in order to support increased
* screen dimensions or double-buffered output in new hardware.
*
* Until these are changed, screens are assumed to be 256 x 256
* single-byte pixels in a 64K area starting at scrbase (set in
* this source file) using the same founts as channel #0 was
* using when this driver was loaded. Once a channel is open its
* founts can be changed with CHAR_USE (TK2), SET_FONT (TTK) or
* POKE_L to offsets 42 or 46 on the address returned by the DIY
* Toolkit function CHBASE(#) for that channel. The base address
* for pixel output to a given channel N can be set to ADDR with 
* POKE_L CHBASE(#N)+50,ADDR or read with PEEK_L(CHBASE(#N)+50), 
* consistently with Sinclair's SCR and CON channels.
*
* Sinclair and GST made no provision for screens larger than
* 32K. This driver defaults to a 64K area giving extra colours
* at the original QL resolutions, implemented by Lion for the
* Next QL FPGA. To customise the dimensions this byte-per-pixel
* driver uses on a given channel, POKE_W CHBASE(#n)+100,X sets
* a width of X bytes/pixels, and use POKE_W CHBASE(#n)+102,Y to 
* indicate that the screen RAM consists of Y contiguous rows
* each of X pixels, with the last (lower right) pixel stored
* at ADDR+(X * Y)-1. Higher resolutions will require more RAM!
*
* The current defaults, or custom settings, can be PEEKed from
* an open SCN channel at offsets 42, 46, 50 and 100 on CHBASE.
*
scnDef  move.w	ca.gtlin\w,a2	Read two long integers
	jsr	(a2)
	bne.s	upset		Return error code in D0
*
	subq.w	#2,d3
	bne	parErr
*
	move.l	0(a1,a6.l),d4	Offset parameter 0..3
	cmp.w	#3,d4		
	bhi.s	rangErr
*		
	lsl.l	#2,d4		Convert to long word index
	add.l	#sc.fount0,d4	Index into device description
*
	move.l	4(a1,a6.l),d5	Long word value to be stored
*
* Locate the device definition by opening a temporary channel		
*
	lea.l	devDesc,a0	SCN string
	moveq	#1,d3		Old shared device
	moveq	#-1,d1		Owned by this task
	moveq	#io.open,d0	IO.OPEN
	trap	#2		
	tst.l	d0
	bne.s	upset
*
	move.l	d4,d1		Address offset parameter
	move.l	d5,d2		Replacement long word parameter
	lea.l	setter,a2
	moveq	#forever,d3
	moveq	#sd.extop,d0	Channel ID is in A0
	trap	#3
	moveq	#io.close,d0
	trap	#2
	bra.s	settled
*
setter	adda.l	d1,a3		Offset A3 into device
	move.l	d2,(a3)		Store new value
*
settled	moveq	#no_error,d0
upset	rts
*
* MODE keywords, encompassing Next QL, CST and Sinclair screen
* modes. Beware that Sinclair's MODE command accepts MODE 512
* and MODE 0 (among other values) as equivalent to MODE 4, and
* (problematically for Next) MODE 256 as a synonym for MODE 8.
*
newMode	movea.w	ca.gtint\w,a2
	jsr	(a2)
	bne.s	upset
*
	subq.w  #1,d3
	bne.s	unknown
*
	movea.l	a3,a5		Hide parameter
	move.w	0(a1,a6.l),d1
	subq.w	#4,d1		Sift the first 4 with subQs
	beq.s	mode4
*
	subq.w	#4,d1
	beq.s	mode8
*
	subq.w	#4,d1
	beq.s	mode12
*
	subq.w	#4,d1
	beq.s	mode16
*
	cmp.w	#32-16,d1
	beq.s	mode32
*
	cmp.w	#64-16,d1
	beq.s	mode64
*
	cmp.w	#256-16,d1
	beq.s	mode256
*
unknown	moveq	#bad_parameter,d0
	rts
*
* MODEn keywords originally needed to read and set the
* MC.STAT lowres bit (3) for 256 pixel per line modes.
* In late April 2025 Lion altered the FPGA at my request
* to get the lowres bit from nsVideo bit 3 (revealing it
* at nsVideo bit 5). This eliminates the need to switch
* the 32K QL screen between MODE 4 and MODE 8, clearing
* it as an annoying side effect, when selecting another
* horizontal resolution for the required Next MODE. The
* code changes are conditional on the following symbol:
*
NEXTRES 
*
HIRES	equ	0
LORES	equ	8		Magic numbers for D1
*
*
mode256	moveq	#nb64k,d0	Select 64K video RAM
	moveq	#LORES,d1
	bra.s	modeN
*
mode32  moveq   #nb128k,d0
        moveq   #HIRES,d1
	bra.s   modeN
*
* 65536 colour lores could use this, although incompatible
* with STRIP, BORDER, INK and PAPER commands and QDOS TRAPs. 
*
mode64  moveq   #nb128k,d0
        moveq   #LORES,d1
        bra.s   modeN
*
mode16	moveq	#0,d1		High resolution please
	moveq	#1,d3		Set Lion16 register
	moveq	#nb64k,d0	Also set nsVideo
	bra.s	modeX
*
mode12	moveq	#LORES,d1
	moveq	#1,d3		Set Lion16 register
	moveq	#0,d0		Clear nsVideo
	bra.s	modeX

mode8	moveq	#LORES,d1
	bra.s	modeQL
*
mode4	moveq	#HIRES,d1
modeQL	moveq	#0,d0
modeN	moveq	#0,d3
modeX	cmpa.l	a3,a5
	bne.s	parErr		Parameters are not welcome
*	
	ifd	NEXTRES
	or.b	d1,d0		Copy loRes bit into nsVideo
	endc
*
	move.b	d3,lion16	Set or clear Next 16-colour bit
	move.b	d0,nsVideo      Set appropriate next mode bits
*
	ifd	NEXTRES
	andi.b  #nb64k+nb128k,d0
	bne.s	modeOK		Next mode, leave MC.STAT alone
	endc	
*
* If the QL MODE does not match the value in D1 we need to
* change it - unfortunately wiping the QL screen - because
* Lion used the MC.STAT bit shadowed at SV.MCSTA to decide
* the resolution of the logically-unrelated Next screen.
* This is only necessary for legacy QL modes or if NEXTRES
* is undefined. The NEXTRES option preserves the contents
* of the QL screen while Next modes are being used on KS2.
*
	ifnd	NEXTRES
	move.w	d1,-(a7)
	moveq	#-1,d1		Read mode 0 or 8 to D1
	moveq	#-2,d2		Ignore TV/Monitor setting
	moveq	#mt.dmode,d0
	trap	#1
*
	move.b	d1,d2		Save QDOS mode
	move.w	(a7)+,d1
	cmp.b	d1,d2
	beq.s	modeOK		No need to change anything
	endc
*
	moveq	#-1,d2		Ignore TV/Monitor flag again
	moveq	#mt.dmode,d0
	trap	#1		Force QDOS to new mode in D1
*
modeOK	move.b	nsVideo,d0
        btst    #nr128k,d0	Sift out the biggest screens	
	bne.s	wipe128
*
* Clear Next memory only if a Next mode was selected
*
	btst	#nr64k,d0		
	bne.s	blanket
*
	moveq	#no_error,d0
	rts
*
chanErr	moveq	#not_open,d0
	rts
*
parErr	moveq	#bad_parameter,d0
intErr	rts
*
* This zeroed the entire screen, reading its address and 
* dimensions from a dummy SCN channel, like scnDef above.
* These values were hard-wired before this redundant code.
*
	ifne	0	DEAD CODE
*
* EXTOP operation to return default scrbase in A1 and pixel
* count in D1.L
*
getter	movea.l	sc.base(a3),a1	Return A1 = screen address
	move.w	sc.linel(a3),d0	Bytes (and pixels) per line
	move.w	sc.lines(a3),d1	Number of screen lines
	mulu	d0,d1		Return D1 = Pixels per screen
	moveq	#no_error,d0
	rts	
*
* Locate the screen base address and dimensions from the 
* device definition by opening a temporary channel.		
*
blanket	lea.l	devDesc,a0	SCN string
	moveq	#1,d3		Old shared device
	moveq	#-1,d1		Owned by this task
	moveq	#io.open,d0	IO.OPEN
	trap	#2		
	tst.l	d0
	bne.s	errD0
*
	lea.l	getter,a2
	moveq	#forever,d3	Infinite timeout
	moveq	#sd.extop,d0	Channel ID is in A0
	trap	#3		Puts scrb in A1, count in D1
	movea.l a1,a2		Protect address from IO.CLOSE
	moveq	#io.close,d0
	trap	#2
	endc
*
* Clear the relevant 64K, or all 128K for KS2-only modes.
*
blanket	move.w	#8191,d1	8 Bytes per DBRA iteration
*
wipen	btst	#nrScn2,d0
	beq.s	wipe0
*
	movea.l	#nextScr2,a2	Clear 64K at second base	
	bra.s	wipe1
*
wipe0	movea.l	#nextScr,a2
wipe1	moveq	#0,d0		Pattern to write to screen
*
wipey	move.l	d0,(a2)+	Slow, but it works
	move.l	d0,(a2)+	Token unrolling; less slow
	dbra	d1,wipey
errD0	rts
*
wipe128 move.w  #16383,d1	+1 for DBRA * 8 =128K
	bra.s	wipe0
*
* Prematurely-optimised and system-crashing "fast" version
* Clear 8*8 long words (one screen line) per iteration
* from the bottom up as MOVEM to RAM can't post-increment. 
*	
	lea.l	scrBase+64*1024+4,a0
	move.w	#63,d7 
	moveq	#7,d0		Temporary test values
	moveq	#6,d1
	moveq	#5,d2
	moveq	#4,d3
	moveq	#3,d4
	moveq	#2,d5
	moveq	#1,d6
	suba.l	a2,a2		0, as the rest should be
*
wiper	movem.l	d0-d6/a2,-(a0)
	movem.l	d0-d6/a2,-(a0)	Fuel for step testing in Qmon
*	movem.l	d0-d6/a2,-(a0)
*	movem.l	d0-d6/a2,-(a0)
*	movem.l	d0-d6/a2,-(a0)
*	movem.l	d0-d6/a2,-(a0)
*	movem.l	d0-d6/a2,-(a0)
*	movem.l	d0-d6/a2,-(a0)
	dbra	d7,wiper
*
	rts		D0 is still 0, no_error
*
* RECOL256 takes an optional channel number and the address
* of a 256-byte colour-remapping table as parameters:	
*
recol256 move.w	ca.gtlin\w,a2	Read both as long integers
	jsr	(a2)
	bne.s	intErr
*	
	moveq	#ch.lench,d0	Channel table stride (to #1)
	subq.w	#1,d3		Is there one parameter?
	beq.s	gotChan		Get table address
*
	subq.w	#1,d3		Are there two parameters?
	bne.s	parErr2
*
	move.l	0(a1,a6.l),d1	
	swap	d1		MULU.W treats D1.H as zero
	tst.w	d1		Validate that assumption
	bne.s	parErr2
*			
	swap	d1
	addq.l	#4,a1		Take channel # off RI stack
	mulu	d1,d0		Index into BASIC channel table
*
gotChan	add.l	bv.chbas(a6),d0	Offset from channel table base
	cmp.l	bv.chp(a6),d0	Don't overshoot
	bge.s	badChan
*
	move.l	0(a6,d0.l),d0	Get QDOS channel ID from BASIC
	bmi.s	badChan		Make sure it's still open
*
	movea.l	0(a1,a6.l),a1	Table address could be anywhere	
	movea.l	d0,a0		Pass channel ID	to QDOS
	moveq	#forever,d3
	moveq	#sd.recol,d0
	trap	#3
	rts			No error expected, return D0
*
badChan	moveq	#not_open,d0
*
anyErr	rts
*
parErr2 moveq	#bad_parameter,d0
	rts
*
* Ink64K uses long words to avoid signed arithmetic
* TO DO: reduce code duplication versus RECOL256
*
ink64   move.w	ca.gtlin\w,a2	Read two long integers
	jsr	(a2)
	bne.s	anyErr
*	
	moveq	#ch.lench,d0	Channel table stride (to #1)
	subq.w	#1,d3		Is there one parameter?
	beq.s	gotCh2		Get table address
*
	subq.w	#1,d3		Are there two parameters?
	bne.s	parErr2
*
	move.l	0(a1,a6.l),d1	
	swap	d1		High word should be zero
	tst.w	d1
	bne.s	parErr2
*			
	swap	d1
	addq.l	#4,a1		Take channel # off RI stack
	mulu	d1,d0		Index into BASIC channel table
*
gotCh2	add.l	bv.chbas(a6),d0	Offset from channel table base
	cmp.l	bv.chp(a6),d0	Don't overshoot
	bge.s	badChan
*
	move.l	0(a6,d0.l),d0	Get QDOS channel ID from BASIC
	bmi.s	badChan		Make sure it's still open
*
	move.l	0(a1,a6.l),d1	Reduce modulo 65536
*
	swap	d1		High word should be zero
	tst.w	d1
	bne.s	parErr2
*
	swap	d1
	lea.l	sink,a2
	moveq	#forever,d3
	moveq	#sd.extop,d0	Channel ID is in A0
	trap	#3
	rts
*
sink	move.w	d1,ink64k(a0)	Store over redundant mask
	moveq	#no_error,d0
	rts
*
* Categorise Lion 128K modes disclosed on 22nd April 2025.
* Reads nsVideo rather than SV.MCSTA if NEXTRES is enabled. 
*
m128k   
	ifd	NEXTRES
	btst.b	#nrLoRes,d0
	endc

	ifnd	NEXTRES
	btst.b  #bLoRes,sv.mcsta(a0) Check for width 256
	endc

	bne.s   hiCol
*      
	moveq   #32,d4		512x256 with 256 colours
	bra.s	gotD4
*
hiCol	moveq	#64,d4		256x256 64K (16 bit) colours
	bra.s	gotD4
*
* Return 4, 8, 12, 16, 32, 64 or 256 by testing hardware state
*
* TO DO: refactor to minimise polling once it matches reality
*
nextMode moveq	#mt.inf,d0
	trap	#1		Point A0 at system variables
* 
	move.b  nsVideo,d0
	btst    #nr128k,d0	Sift out 128K modes
	bne.s	m128k
* 
	btst	#nr64k,d0       Sift out 64K Next modes              
	bne.s	try256
*
        moveq   #12,d1          Isolate MODE 8 and 12 bits
        and.b   sv.mcsta(a0),d1 Check horizontal resolution
	bne.s	lowRes
*
	moveq	#4,d4		Sinclair MODE 4 or Lion 16
	btst	#nrLoCol,d0	QL Next bit to test
	beq.s	gotD4
*
got16	moveq	#16,d4		MODE 16 detected
	bra.s	gotD4
*
* We could be in Sinclair MODE 8, Next or CST MODE 12
*
lowRes	btst    #bMode12,d1
	bne.s	got12           CST MODE 12 flagged in SV.MCSTA
*
        btst    #nrLoCol,d0     Check QL NEXT 4-bit colour flag 
	bne.s	got12		Lion Next QL MODE 12
*
	bra.s	gotMode		Must be Sinclair MODE 8
*
got12	moveq	#12,d4
	bra.s	gotD4
*
* We are in a 64K mode, either with 256 or 16 colours
*
try256	btst    #nrLoCol,d0
	bne.s	got16           512x256, 16 colours
*
	move.w	#256,d1		256 colours and 256x256 pixels
*	
gotMode	move.w	d1,d4		BV.CHRIX clobbers D0..D3
gotD4	moveq	#intSize,d1	Reserve a word on the RI stack
	movea.w	bv.chrix\w,a1
	jsr	(a1)		Cannot return an error here
*
	movea.l	bv.rip(a6),a1
	lea.l	-intSize(a1),a1 Quicker than SUBQ on 68008
	move.l	a1,bv.rip(a6)	Adjust stack pointer
	move.w	d4,0(a1,a6.l)	Stack the result
	moveq	#3,d4		
	moveq	#no_error,d0
*	
fail	rts			Return error code in D0
*
* Open a SCN channel - called by QDOS in supervisor mode
* A0 -> open string, A3 -> device driver linkage and data 
* 
opener	move.w	io.name\w,a1	Name helper utility	
	lea.l	sc.sizes(a3),a3	Point at parameter buffer
	jsr	(a1)
	bra.s	fail		Name not matched
	bra.s	fail		Parameter(s) not accepted
	bra.s	defChan
*
* To facilitate testing on Sinclair ROMs the default screen
* area uses only the top 32K of video RAM always available,
* pixel lines 0 to 127, the top half of the MODE 256 screen.
* Once system variables are moved up from the address used
* by QDOS, additional lines may safely be rendered, though
* many Sinclair-era QL programs may be incompatible with
* the non-standard system base address thus required. Unless
* Minerva or a similar ROM is used, or a custom SCRB display
* is enabled in higher memory, system variables occupy the
* second 32K screen memory area and SCN can't PRINT there!
*
devDesc	dc.w	3		Name prefix string length
	dc.b	"SCN "		Padded to word boundary
	dc.w	5		Parameter count
	dc.w	" _",512	Full width in MODE 4 pixels
	dc.w	" X",128	Half-height in pixels
	dc.w	" A",0		Default no left margin
	dc.w	" X",0		Default top margin (none)
        dc.w    4,"AB","CD"     First or second 64K, or 128K
*
* N.B. The spaces above before the parameter separators _ X A X
* are vital; nulls would indicate that option letters follow!
*
* The final suffix letter is implemented only on Artix FGPAs
* which support 128K of video RAM for extra modes. It returns
* 'not implemented' on KS1 and similar Spartan-based systems.
*
* Option A uses the default 64K video RAM address $4C0000
* Option B uses the alternate 64K video RAM address $4E0000
* Option C (for colours) uses 128K for 65536 colour 256x256
* Option D (double/deep) uses 128K for 512x256 in 256 colours
*
defChan	moveq	#sd.end,d1	Size of channel definition
	move.w	mm.alchp\w,a1	Supervisor allocate memory
	jsr	(a1)		Obtain address in A0
	bne.s	fail
*
        move.w  8(a3),d3        Get resolution and base hints 
	beq.s	use64k
*
* Prevent attempts to use 128K VRAM on systems with only 64K
* 
	cmp.w	#KS2_V,FPGA_V	Suffixes are for Artix only
	bne	awol            Otherwise, "not implemented"
*
* Copy the X,Y size and origin words from device to channel
*
use64k	lea.l	sd.xsize(a0),a2
	move.w	(a3)+,d1
	cmp.w   #4,d3		D double-deep suffix
        beq.s   hi64k
*
	lsr.w	#1,d1		Convert from MODE 4 to pixels
*
hi64k	cmp.w	#cWidth,d1
	blt	cleanup		Ensure room for one character
*
	move.w	d1,(a2)+	Set sd.xsize, cWidth..sWidth
	move.w	(a3)+,d1
	cmp.w	#cHeight,d1	
	blt	cleanup		Ensure room for one character
*
	move.w	d1,(a2)		Set sd.ysize, cHeight..sHeight
	subq.l	#sd.ysize-sd.xmin,a2	Inconsistently ordered
	move.w	(a3)+,d1	Read minimum X	
        cmp.w   #4,d3           That D option for widths of 512
	beq.s	hiXmin
*
	lsr.w	#1,d1		Convert from MODE 4 to pixels
*
hiXmin	move.w	d1,(a2)+	Set sd.xmin
	move.w	(a3)+,(a2)	Set sd.ymin, checked later
	lea.l	-8-sc.sizes(a3),a3 Restore device link pointer
*
* Configure line width between rows, in bytes, and row count
*
	move.l	sc.linel(a3),sd.linel(a0) Also copies sc.lines
        move.l  #cWidth<<16+cHeight,sd.xinc(a0)
*
* Not needed as allocated memory was cleared by QDOS
*
*       moveq	#0,d0
*	move.w	d0,sd.borwd(a0)	No border
*	move.l	d0,sd.xpos(a0)	Home X and Y to top left
*       move.l  d0,sd.cattr(a0) No special attributes
*       move.l  d0,sd.cattr(a0) No special attributes
*
* Same long word says no cursor and clears Strip and Paper 
*
*	move.w	d0,sd.nlsta(a0) No pending newline or FILL
*	move.l	d0,strip0(a0)   Prepare strip masks
*	move.b	d0,strip3(a0)	Default black, colour 0
*
* In order for the graphics version of CURSOR to work without
* implementing POINT, LINE etc, floating-point XORG and YORG
* or equivalently SD.SCAL must be zero. The default suffices. 
*
	moveq	#dInk,d1	Default ink is 255
	move.w	d1,sd.icolr(a0) Same word sets sd.bcolr
	move.b	d1,ink0(a0)	Misaligned - do not merge
	move.b	d1,ink1(a0)
	move.w	d1,ink2(a0)	This word also sets ink3
	move.w	d1,ink64k(a0)   For 65536 colour plotting
*
	move.l	sc.fount0(a3),sd.fount0(a0)
	move.l	sc.fount1(a3),sd.fount1(a0)
	move.l	sc.base(a3),sd.scrb(a0)
*
* ABH suffix only applies if screen RAM is above the QL area
*
        ifne    scrBase-ql_scrbase
*
* Adjust address and line length if modified by ABX suffix
*
        subq.w  #2,d3
        bmi.s   ready		No suffix or A, use scrBase
*
	beq.s   screenB         B screen, add 64K
*
        subq.w  #3,d3           Negative for C or D suffix
        bpl.s   ready           Ignore unknown suffixes
*
* Double the line-length for 256 colour HiRes 512x256 
* (or graphic-only 64K-colour 256x256 pixel) 128K screens
*
bigLine move.w  sd.linel(a0),d3
        add.w   d3,d3
        move.w  d3,sd.linel(a0)
        bra.s   ready
*
* Render to the second 64K area if the B screen was selected
*
screenB move.l	#nextScr2,sd.scrb(a0)
*
        endc         		128K video RAM options	
*
* Channel is now ready for use, as long as window is sensible
*
ready	bsr.s	valid8		Make sure window fits screen
*
	bmi.s	cleanup		Error code is in D0 already
*
	moveq	#no_error,d0
	rts
*
* Validate window fits on screen, returns with N set otherwise 
*
valid8	move.l	sd.linel(a0),d0	Process X and Y in parallel
	sub.l	sd.xmin(a0),d0
	sub.l	sd.xsize(a0),d0
	bmi.s	oops		X total too large		
*
	tst.w	d0		Set N flag if Y too long
*
oops	rts
*
cleanup	bsr.s	closer		Open failed, free channel data
derange	moveq	#range_error,d0	Out of range
	rts
*
* Closing a channel simply involves releasing its memory.
* If FILL is implemented later, release its buffer too! 
*
closer	movea.w	mm.rechp\w,a1	Supervisor heap release vector
	jmp	(a1)		Release channel description
*
	ifd	TESTBED	
*
* Temporary space for a dummy channel table while testing
*
chanTab	ds.b	sd.end
*
procDef	dc.w	2
	dc.w	qprint-*
	dc.b	6,"QPRINT"
	dc.w	qblock-*
	dc.b	6,"QBLOCK"
	dc.w	0,0,0
*
qprint	movea.w	ca.gtstr\w,a2
	jsr	(a2)		Fetch string parameter
	bne.s	exit
*
	subq.w  #1,d3		Only one parameter please
	bne.s	badPar
*
	move.w	0(a1,a6.l),d2	Pick up string length
	lea.l	2(a1,a6.l),a1	A1 -> start of string text
*
* Beware - test only - if BASIC moves A1 might pick up junk
*
	lea.l   chanTab,a0      Dummy window channel table
	movem.l	d4-d7/a4/a5,-(sp)
	jsr	strOut		N.B. this shouldn't touch A6
	movem.l	(sp)+,d4-d7/a4/a5
	rts			Error code is in D0
*
* QBLOCK width,height,X,Y,COL - where WIDTH and X are even, 
* 0..510, in BLOCK-standard MODE 4 units, Y and HEIGHT count
* lines 0..255, and COL is the colour byte (taken MOD 256).
*
qblock	movea.w	ca.gtint\w,a2
	jsr	(a2)
	bne.s	exit
*
	subq.w  #5,d3
	bne.s	badPar
*
	move.b	9(a1,a6.l),d1	Pick up colour 0..255
	lea.l	chanTab,a0      Dummy window definition 
*
* We need a temporary buffer for the four BLOCK parameter 
* words; for test purposes, these clobber some space for
* the unsupported floating-point graphics window origin. 
*
	lea.l	sd.yorg(a0),a2	Borrow 8 bytes from SCALE
	move.l	0(a1,a6.l),(a2)	Buffer the width and height
	move.l	4(a1,a6.l),4(a2) Buffer top left coordinates
	move.l	a2,a1		A1 now points to parameters
	move.l	d4,a2		We need D0 to D4, A0 and A1
	bsr	bblock		Emulate SD.FILL 
	move.l	a2,d4		Restore SuperBASIC's D4
	rts
*
badPar	moveq	#bad_parameter,d0
exit	rts
	endc
*
****************************************************************
*
* Runtime code starts here
*
* Timeout if CTRL-F5 is active
*
notNow	moveq	#0,d1		No characters sent
notDone moveq   #not_complete,d0  
        rts
*
pending	st	sd.nlsta(a0)	Newline needed next time around
*
* If a newline is pending PRINT TO needs to know we will be
* counting from the left margin, but we can't indicate this
* by clearing XPOS as that will lose the last part of a long
* line from LIST when it calls CLRRT to wipe the end of line.
* So PRINT needs to know about pending newlines: see SD.CHENQ
* for tentative reconciliation of this anomaly.
*
*	clr.w	sd.xpos(a0)	PRINT TO needs to know this
*
strDone	swap	d1              Count all bytes sent (D2 input)
	ext.l	d1		In case someone presumes D1.L!
        moveq   #no_error,d0
        rts
*
* Character output
*
* Character-output is optimised for GROUPS of characters, and 
* expects them to be at (A1). IO.SBYTE calls must be kludged to
* mimic IO.SSTRG, by copying the parameter to a one-byte buffer
* used only inside IO.SBYTE, so re-entrancy is maintained.
*
sbyte   tst.b   sv.scrst(a6)    Screen frozen with CTRL-F5?
        bne.s   notDone		Return 'not complete' at once
*
        lea.l   dummy(a0),a1	Point A1 at a dummy string      
        move.b  d1,(a1)         Buffer the character
*
* Since D2.H is not touched elsewhere in character output,
* load a constant word (not long with MOVEQ) in case the
* top 16 bits will be useful for some later optimisations.
*
        move.w  #1,d2           One character to be done
        bra.s   strOut		QDOS preserves initial D2
*
misFit	moveq   #range_error,d0 Window too small for glyphs
        rts
*
* String output, A1 points to D2.W bytes
*
sstrg   tst.b   sv.scrst(a6)    CTRL-F5?
        bne.s   notNow
*
* TO DO: Refactor down to YBASE to reduce LF checks and 
* simplify XPOS updates, cutting the per-character overhead.
*
* Reject character sizes of less than 2 columns or 4 rows,
* to eliminate the need for checks on each character later.
*
strOut  move.w  sd.yinc(a0),d7
	subq.w	#3,d7		Don't allow YINC of 3 or less
	ble.s	misFit		Protects DBRA in output loops
*
	move.w	sd.xinc(a0),d7
	subq.w	#1,d7		Don't allow XINC less than 2
	ble.s	misFit
*
        move.w  d2,d4		Move string length to D4.H for
	swap	d4		later, leaving room for stride
	move.l	d4,d1		Copy count to D1.H for return
	movem.w	ink1(a0),d2-d3  Preload ink & strip patterns
	move.w	sd.linel(a0),d4
	sub.w	sd.xinc(a0),d4	Slot character stride in D4.W 
	move.w	strip2(a0),d1   Prime strip, preserving D1.H
*
* Is there a newline pending?
*
	ifd	PENDY
	tst.b	sd.nlsta(a0)
	bne.s	do_nl
	endc 
*
* D0 is reloaded from STRIP0 for each character to be output
*
* TO DO: Outside character loop test OVER/UNDER/XINC/YINC to
* preload the address of a character output routine into A2
*
strLoop swap    d4
        subq.w	#1,d4		Count down through input
	bmi.s	strDone
*
        swap    d4	 
	moveq	#0,d0		FOUNT FIX: clear D0 bits 8..15
	move.b  (a1)+,d0	Get the next character code
        cmp.b   #lineFeed,d0	LineFeeds are a special case
        bne.s   tryFit		Render glyph of character in D0
*
* LineFeed handler starts by checking if we may need to scroll:
*
	ifd	PENDY
	swap	d4
	tst.w	d4		Is this LF the only character?
	beq.s	pending		If so, return and flag for later
*
	swap	d4
	endc
*
do_nl	move.w	sd.ysize(a0),d0	D0 is scratch at this point
	sub.w	sd.ypos(a0),d0	Allow for cursor position
	move.w	sd.yinc(a0),d7	Cache YINC as we need it often
	sub.w	d7,d0		Traverse current line
	sub.w	d7,d0		Consider the line below that
	bmi.s	scrolly
*
	add.w	d7,sd.ypos(a0)  Cursor to next line of window
rollOut	clr.w	sd.xpos(a0)	Implied carriage return
	clr.b	sd.nlsta(a0)
	bra.s	strLoop
*
* Scroll up, leaving YPOS unaltered on the bottom window line
*
scrolly	movem.l	d1/d4/a1,-(a7)	Save colours and input position
	moveq	#sd.scrol*2,d0	Simulate despatch table index
	move.w	d7,d1
	neg.w	d1		Reverse direction (to roll up)
	bsr	scrol		Re-use SCROLL #,-,0 routine
	movem.l	(a7)+,d1/d4/a1
	movem.w	ink1(a0),d2-d3  Reload ink & strip patterns
	bra.s	rollOut
*
* Render the character in D0; will it fit on the current line?
*
tryFit	move.w	sd.ypos(a0),d7	Line offset always needed later	
	move.w	sd.xpos(a0),d6
	movea.l	sd.scrb(a0),a3  
	adda.w	d6,a3		Potential pixel column   
	add.w	sd.xinc(a0),d6  Advance XPOS for next character  
	cmp.w	sd.xsize(a0),d6 Got room for current character?
	bls.s	xFits
*
* The current line is full. Move to the next, maybe scrolling.
*
	move.w	sd.ysize(a0),d6	Find usable height in window
	sub.w	sd.ypos(a0),d6	Allow for the cursor position
	move.w	sd.yinc(a0),d7	Cache YINC as we need it often
	sub.w	d7,d6		Traverse current line
	sub.w	d7,d6		Consider the line below that
	bcc.s	wrapX		
*
* Make a new line at YPOS by scrolling window contents upward 
*
	movem.l	d0/d1/d4/a1,-(a7) Save char code & input position
	moveq	#sd.scrol*2,d0	Simulate despatch table index
	move.w	d7,d1		Scrolling distance, in pixels
	neg.w	d1		Reverse direction (to roll up)
	bsr	scrol		Re use SCROLL #,-,0 routine
	movem.l	(a7)+,d0/d1/d4/a1 Old D5-D7/A3-A4 got clobbered
	movem.w	ink1(a0),d2-d3  Reload ink & strip patterns
	movea.l	sd.scrb(a0),a3	Rethink screen address
	move.w	sd.ypos(a0),d7	D7 will be applied to A3 later
	bra.s	indent
*
wrapX	movea.l	sd.scrb(a0),a3  Cancel tentative XPOS offset	
*
* Advance the text cursor to the start of the line below:
*
	move.w	sd.ypos(a0),d6	Where was the previous line?
	add.w	d6,d7		Current relative line number 
	move.w	d7,sd.ypos(a0)	D7 will be applied to A3 later
indent	move.w	sd.xinc(a0),d6	Cursor ends up indented by XINC
*
xFits	adda.w	sd.xmin(a0),a3	Apply X offset of the window
	move.w	d6,sd.xpos(a0)	Prepare XPOS for the next glyph
*
* We could hard-wire SD.LINEL to avoid the MUL, using an 8-bit
* shift EXT.L D7 LSL.L #8,d7 (26/34Ts on 68000/08) but since we
* have the ideal general instruction, it makes sense to use it,
* especially as the TG68K FPGA MUL takes 3 Ts (and many LUTs).
*
yBase	add.w	sd.ymin(a0),d7	Apply vertical window offset
	mulu	sd.linel(a0),d7 
	adda.l	d7,a3		Next glyph Target, top left 
*
* Fount support, biased to favour fount0 with fast path      
*
        movea.l sd.fount0(a0),a4
        sub.b   (a4)+,d0        Check against first code
	bcc.s	maybe0          High enough for fount0
*
try1    add.b   -(a4),d0        Restore ASCII code
        movea.l sd.fount1(a0),a4
        sub.b   (a4)+,d0        Could it be in fount1?
	bcc.s	maybe1
*
* Minerva fount1 wraps around from code 255 to 0; detect this
*
	move.b	-1(a4),d7	Add up prefix bytes as words
	ext.w	d7
	move.b	(a4),d6
	ext.w	d6
	add.w	d6,d7
	lsr.w	#8,d7		Byte overflow shows Lau's hack
	bne.s	maybe1	
*
        addq.l  #1,a4		Point at default glyph
        bra.s   gotCha
*
maybe1  cmp.b   (a4)+,d0        Is it in fount1?
	bls.s	index
	
	bra.s	gotCha		Use default glyph at (a4) 	
*
maybe0	cmp.b	(a4),d0         How many patterns are there?
	bhi.s	try1            Too high, try other fount
*
	addq.l	#1,a4		Past count to first pattern
*
index	mulu	#9,d0		
	adda.l	d0,a4		A4 -> Pattern in fount    	
*
* TO DO: move next block from the character loop by presetting
* A2 and A5 for optimised paths with character-line rendering,
* flagged by the sign of D7.H, otherwise point A2 at iterative
* pixel-output code for the remaining character heights and 
* widths and OVER settings, handling UNDER in their epilogues.
*
gotCha	lea.l   csize0x,a2      CSIZE 0,0 OVER 0 despatch table
	move.w	sd.yinc(a0),d7
	move.w	strip0(a0),d0   Complete ink patterns in D0-D3
*
* TO DO: Optimise OVER case selection out of per-character loop
*
	moveq	#1<<sa.trans+1<<sa.xor,d6
	and.b	sd.cattr(a0),d6
	subq.b	#1<<sa.trans,d6	
	bmi.s	over0           Neither bit set, use OVER 0
*
	beq 	over1		Use slow OVER 1 implementations
*
* Specialisation for OVER -1 in all widths
*
	moveq	#cWidth,d6	Unrolling needs default width
	cmp.w	sd.xinc(a0),d6
	bne	xover1		Otherwise X loop is needed
*
	cmp.w 	#cHeight,d7	Unrolling needs default height
	bne	xover1		Otherwise use X and Y loops
*
        lea.l   csize0e,a2      Swap in XOR despatch table
	adda.l	d6,a3		Skip over top blank line
	bra.s	xover6
*
* TO DO: Optimise next tests into the pre-character loop setup
* 
over0	cmp.w	#cWider,sd.xinc(a0)
	beq.s	print1		Special code for XINC 8, YINC 4+
*
	cmp.w	#cHeight,d7
	bne	print0		Use generic loops for other heights
*
	cmp.w	#cWidth,sd.xinc(a0)
	bne	print0		Use generic loops for other widths
*
        move.w  d0,(a3)+
        move.w  d0,(a3)+        Output blank top line of glyph
        move.w  d0,(a3)+
*
xover6	moveq   #-4,d5          Preload Six pixel mask %11111100
	lea.l   floop,a5        Next line continuation address
*
* Character output for OVER 0 and -1 uses per-line despatches for
* CSIZE 0,0 - three words per line, max ten lines, including gap,
* so if the window is known to be clear, OVER -1 outruns OVER 1.

* On entry A5 holds the start of the per-character row loop
*          A4 points to first byte of fount character data
*          A3 points to the top left pixel of the character
*          A2 points at the 64 six-pixel output routines
*          D0.W holds two bytes of strip
*          D1.W has a high byte of strip and a low byte of ink
*          D2.W has a high byte of ink and a low byte of strip
*          D3.W holds two bytes of ink
*          D4.W is the stride from just after one character row
*             to the start of the character row on the next line
*          D5 masks clear the two lowest bits of the fount data
*          D7 is a copy of SD.YINC, pixel lines per character
*
* Before, after and while a character is being output
*
*          A0 points to the channel definition
*          A1 is preserved for access to the next character code
*          A6 points to the system variables (for finer CTRL-F5)
*	   D1.H holds the length of the whole character string 
*	   D2.H is preserved for potential flag optimisations
*	   D3.H is also preserved for potential optimisations
*          D4.H is the number of remaining characters to output
*          D6 is scratch or used to select a row-output routine
*          D7 is scratch or counts down rows of the character
*
* Expand D7 (currently always 9) lines from the fount
* TO DO: Free up earlier checks to allow other YINC values
*
floop   subq.w  #1,d7
        ble.s   done10
*
        adda.w  d4,a3           Advance to output second line
        moveq   #0,d6           Clear high word of offset
        move.b  (a4)+,d6        Fetch pixel line from fount
        and.w   d5,d6           Clear lowest two bits
        add.w   d6,d6           Form despatch offset 0..63*8
	jmp	0(a2,d6.w)      Preserve A2 for next row
*
done10	btst	#sa.under,sd.cattr(a0)
	beq	strLoop         Leave penultimate line alone
*
doUnder	suba.w	sd.linel(a0),a3	Wind back to add underline	
	move.w	d3,-(a3)  
	move.w	d3,-(a3)	Overdraw from right to left  
	move.w	d3,-(a3)  
	bra 	strLoop                 
*
* Unrolled CSIZE 2,0 OVER 0 optimised loop
*
* Expand D7 (up to 9) lines from the fount
* D5 replaces the CSIZE 0 byte mask with a despatch multiplier
*
clamp1	moveq	#cHeight,d7	Clamp count if YINC > 10
	bra.s	p1blank
*
skip1   suba.w  d4,a3		Compensate for stride advance
	bra.s	p1head
*
print1	lea.l	p1head,a5
	lea.l	csize1e,a2	256x10 byte row-output macros
	moveq	#10,d5		Per-character table stride
	cmp.w	#cHeight,d7	Is YINC smaller than usual?
	bhi.s	clamp1		Stay within fount if YINC>10
*
	blt.s	skip1           No blank top line required
*
p1blank	move.w	d0,(a3)+
	move.w	d0,(a3)+
	move.w	d0,(a3)+
	move.w	d0,(a3)+
*
p1head  subq.w  #1,d7
        ble.s   p1done
*
        adda.w  d4,a3           Advance to output second line
        moveq   #0,d6           Clear high word of MUL offset
        move.b  (a4)+,d6        Fetch pixel line from fount
        mulu	d5,d6           Form despatch offset 0..255*10
	jmp	0(a2,d6.l)      Preserve A2 for next row
*
p1done	btst	#sa.under,sd.cattr(a0)
	beq	strLoop         Leave penultimate line alone
*
	suba.w	sd.linel(a0),a3	Wind back to add underline	
	move.w	d3,-(a3)  
	move.w	d3,-(a3)	Overdraw from right to left  
	move.w	d3,-(a3)
	move.w	d3,-(a3)  
	bra 	strLoop                 
*
*****************************************************
*
* Relatively slow and general OVER 1 routine. Since each
* row may both skip and output three pixels the despatch
* scheme would need at least 14 bytes of code per line,
* and preferably 16 for speed, which would soak up 1K
* just for the least-used case of CSIZE 2,0 (and 4.5K for
* wide characters, using a MUL #18 to index 256 entries)
* so this implementation just works pixel-by pixel, and
* can therefore cater for wider or narrower widths with
* little additional overhead. Wider characters ignore
* sa.wide in SD.CATTR and deduce the width from SD.XINC.
* Double-height characters are sifted out and processed
* in a modified loop along lines documented for PRINT0.  
*
over1	move.w	sd.xinc(a0),d5
	subq.w	#1,d5		Adjust width for DBRA
	movea.w	d5,a5		Cache X count for each line
	btst	#sa.tall,sd.cattr(a0)
	bne.s	fad1
*
	subq.w	#4,d7		Top, underline, last and DBRA
	cmp.w	#5,d7		Is YINC less than standard (10)
	bgt.s	short1		If not, skip blank and clamp D7
*
	addq.w	#1,d7		No blank line, use more fount
	bra.s	doTop
*
short1	adda.w	sd.linel(a0),a3	Skip empty top line
	moveq	#6,d7		Use full fount, but no more
*
* Process D7+1 lines of D5+1 pixels from the fount	 	
*
doRow	move.w	a5,d5		Copy X count
doTop	move.b	(a4)+,d6	Pick up a line of the fount
	bra.s	over1b
*
over1a	add.b	d6,d6		Shift next pixel bit to N flag
over1b  bpl.s	over1c
*
	move.b	d1,(a3)		Plot one pixel of ink
over1c	addq.l	#1,a3		Advance to next pixel
	dbra	d5,over1a	Process row of D5+1 pixels
*
	adda.w	d4,a3		Stride to next character row
	dbra	d7,doRow  
*
* Penultimate row may be an underline or data from the fount.
*
* N.B. Underlining always uses the penultimate row, even if
* YINC is less than 10, so for the minimum YINC of 4 the top
* two rows from the fount will be output (with no blank row 
* them) then the third row potentially crossed by underlining,
* then the final row. Consistently with Minerva, the top blank
* is suppressed for heights <10, allowing YINC 9 to render UDGs
* without gaps and YINC 8 to render 8-pixel Spectrum founts,
* but in either case the underline will still be applied to
* the penultimate line rather than the (easier) last one.
*
	move.w	a5,d5
	moveq	#-1,d6		All bits set for underline
	btst	#sa.under,sd.cattr(a0)
	bne.s	over1l		Underline overrides fount
*
	move.b	(a4)+,d6	Pick up next line from fount
	beq.s	over1k		Optimisation; skip empty line
*
	bpl.s	over1f		Skip first blank pixel
*
	bra.s	over1e		Plot first pixel
*	
over1d	add.b	d6,d6
	bpl.s	over1f
*
over1e	move.b	d1,(a3)		Plot a pixel
over1f	addq.l	#1,a3		Advance whether plotted or not
	dbra	d5,over1d	Consider the whole row
*
	move.w	a5,d5		Restore horizontal counter
*
* Render the final row (for this YINC size) from the fount
*
over1g	move.b	(a4)+,d6	Pick up next line from fount
	beq.s	overOut
*
	adda.w	d4,a3		Stride to last character row
	bra.s	over1i		Consider first pixel
*	
over1h	add.b	d6,d6
over1i	bpl.s	over1j
*
	move.b	d1,(a3)
over1j	addq.l	#1,a3
	dbra	d5,over1h
*
overOut	bra	strLoop
*
over1k	adda.w	sd.linel(a0),a3	Skip empty penultimate line
	bra.s	over1g
*
over1l	addq.l	#1,a4
	bra.s	over1e
*
* While range_error seems logical, QDOS returns no error but
* draws nothing if there's not enough space for a character.
*
unFit	moveq	#no_error,d0
*	moveq   #range_error,d0	Still useful for testing
        rts
*
* Double height OVER 1 implementation - see FAT0 below for
* details, this leaves background (strip) pixels untouched.
* The code duplication minimises per-character conditional
* tests, testing and development time, repurposing D2 and 
* from colour optimisations for tall stride adjustments.
*
fad1	move.w	sd.linel(a0),d3	Clobber D3 paper word
	move.w	d4,d2		Copy the default stride
	add.w	d3,d2		Stride over one extra row	
	lsr.w	#1,d7		Two screen rows per fount row
	subq.w	#4,d7		Count top, under, last and DBRA
*
	cmp.w	#5,d7		Suppress empty top line?
	bgt.s	fadTop		If not, skip over and clamp D7
*
	addq.w	#1,d7		No blank line, use more fount
	bra.s	fadFont
*
* YINC is at least 10 pixels so stride over the blank top rows
*
fadTop	adda.w	sd.xinc(a0),a3
	adda.w	d2,a3		Stride on to use fount data 
	moveq	#6,d7		Use full fount, but no more
*
* Process D7+1 lines of D5+1 pixels from the fount	 	
*
fadLine	move.w	a5,d5
fadFont	move.b	(a4)+,d6	Pick up a line of the fount
	bra.s	fad1b
*
fadLoop add.b	d6,d6		Shift next pixel bit to N flag
fad1b 	bpl.s	fadStp
*
	move.b	d1,(a3)		Overprint one pixel of ink
	move.b	d1,(a3,d3.w)	Also blat the row below that
*
fadStp	addq.l	#1,a3
*	
fadNext	dbra	d5,fadLoop	Process row of D0+1 pixels
*
	adda.w	d2,a3		Stride to next character row
	dbra	d7,fadLine  
*
* Penultimate line may be underlined or data from the fount
*
	move.w	a5,d5
	moveq	#-1,d6		All bits set for underline
	btst	#sa.under,sd.cattr(a0)
	bne.s	fad1c		Underline overrides fount
*
	move.b	(a4)+,d6	Pick up row from fount
*
	bpl.s	fad1f		Skip first blank pixel
*
	bra.s	fad1e		Plot first pixel
*
fad1c   addq.l	#1,a4		Skip underlined row
	bra.s	fad1e
*	
fad1d	add.b	d6,d6
	bpl.s	fad1f
*
fad1e	move.b	d1,(a3)		Plot ink
	move.b	d1,0(a3,d3.w)	Ink the row below that
*
fad1f	addq.l	#1,a3		Skip strip
*
fad1g	dbra	d5,fad1d
*
	move.w	a5,d5		Restore row count
	adda.w	d2,a3		Stride to next row
	move.b	(a4)+,d6	Pick up last row from fount
	bra.s	fad1i		Plot first pixel
*	
fad1h	add.b	d6,d6
*
fad1i	bpl.s	fad1j
*
	move.b	d1,(a3)		Plot ink
	move.b	d1,0(a3,d3.w)	Ink the row below that
*
fad1j	addq.l	#1,a3
*
fad1k	dbra	d5,fad1h
*
	bra	strLoop
*
* Generic OVER -1 for widths at least 2 and heights 4 or more.
* Identical to OVER 1 except that it uses EOR to draw pixels.
*
xover1	move.w	sd.xinc(a0),d5
	subq.w	#1,d5		Adjust width for DBRA
	movea.w	d5,a5
	btst	#sa.tall,sd.cattr(a0)
	bne.s	fax1
*
	subq.w	#4,d7		Discount top, under, last, DBRA
	cmp.w	#5,d7		Smaller than Sinclair standard?
	bgt.s	xshort1		If not, skip blank and clamp D7
*
	addq.w	#1,d7		No blank line, use more fount
	bra.s	xoTop
*
xshort1	adda.w	sd.linel(a0),a3	Skip empty top line
	moveq	#6,d7		Use full fount, but no more
*
* Process D7+1 lines of D5+1 pixels from the fount	 	
*
xoRow	move.w	a5,d5		Copy X count for each line
xoTop	move.b	(a4)+,d6	Pick up a line of the fount
	bra.s	xover1b
*
xover1a	add.b	d6,d6		Shift next pixel bit to N flag
xover1b bpl.s	xover1c
*
	eor.b	d1,(a3)		Exclusive OR one pixel
xover1c	addq.l	#1,a3		Advance to next pixel
	dbra	d5,xover1a	Process row of D0+1 pixels
*
	adda.w	d4,a3		Stride to next character row
	dbra	d7,xoRow  
*
* Penultimate line may be underlined, and is often empty
*
	move.w	a5,d5
	moveq	#-1,d6		All bits set for underline
	btst	#sa.under,sd.cattr(a0)
	bne.s	xover1l		Underline overrides fount
*
	move.b	(a4)+,d6	Read penultimate fount row
	beq.s	xover1k		... which is often zero
*
	bpl.s	xover1f		Skip first blank pixel
*
	bra.s	xover1e		Plot first pixel
*	
xover1d	add.b	d6,d6
	bpl.s	xover1f
*
xover1e	eor.b	d1,(a3)
xover1f	addq.l	#1,a3
	dbra	d5,xover1d
*
	move.w	a5,d5
*
* Render the last row
*
xover1g	move.b	(a4)+,d6	Pick up last row from fount
	beq.s	xoverOut	Skip the entire empty line
*
	adda.w	d4,a3		Stride to last character row
	bra.s	xover1i		Consider the leftmost pixel
*	
xover1h	add.b	d6,d6		Shift bits up through sign 
xover1i	bpl.s	xover1j		Nothing to plot
*
	eor.b	d1,(a3)
xover1j	addq.l	#1,a3		
	dbra	d5,xover1h
*
xoverOut bra	strLoop
*
xover1k	adda.w	sd.linel(a0),a3	Skip over the unchanged row
	bra.s	xover1g		D5 count is already preset
*
xover1l	addq.l	#1,a4		Skip underline fount row
	bra.s	xover1e
*
* Double height OVER -1 implementation - see FAT0 below for
* details, this is the same except that it uses EOR not MOVE
* to plot ink and leaves background (strip) pixels untouched.
*
fax1	move.w	sd.linel(a0),d3	Clobber D3 paper word
	move.w	d4,d2		Copy the default stride
	add.w	d3,d2		Stride over one extra row	
	lsr.w	#1,d7		Two screen rows per fount row
	subq.w	#4,d7		Count top, under, last and DBRA
*
	cmp.w	#5,d7		Suppress top blank line?
	bgt.s	faxTop		If not, skip over and clamp D7
*
	addq.w	#1,d7		No blank line, use more fount
	bra.s	faxFont
*
* YINC is at least 10 pixels so stride over the blank top rows
*
faxTop	adda.w	sd.xinc(a0),a3
	adda.w	d2,a3		Stride on to use fount data 
	moveq	#6,d7		Use full fount, but no more
*
* Process D7+1 lines of D5+1 pixels from the fount	 	
*
faxLine	move.w	a5,d5
faxFont	move.b	(a4)+,d6	Pick up a line of the fount
	bra.s	fax1b
*
faxLoop add.b	d6,d6		Shift next pixel bit to N flag
fax1b 	bpl.s	faxStp
*
	eor.b	d1,(a3)		Flip one pixel of ink
	eor.b	d1,(a3,d3.w)	Also EOR the row below that
*
faxStp	addq.l	#1,a3
*	
faxNext	dbra	d5,faxLoop	Process row of D0+1 pixels
*
	adda.w	d2,a3		Stride to next character row
	dbra	d7,faxLine  
*
* Penultimate line may be underlined or data from the fount
*
	move.w	a5,d5
	moveq	#-1,d6		All bits set for underline
	btst	#sa.under,sd.cattr(a0)
	bne.s	fax1c		Underline overrides fount
*
	move.b	(a4)+,d6	Pick up row from fount
*
	bpl.s	fax1f		Skip first blank pixel
*
	bra.s	fax1e		Plot first pixel
*
fax1c   addq.l	#1,a4		Skip underlined row
	bra.s	fax1e
*	
fax1d	add.b	d6,d6
	bpl.s	fax1f
*
fax1e	eor.b	d1,(a3)		Plot ink
	eor.b	d1,0(a3,d3.w)	Ink the row below that
*
fax1f	addq.l	#1,a3		Skip strip
*
fax1g	dbra	d5,fax1d
*
	move.w	a5,d5		Restore row count
	adda.w	d2,a3		Stride to next row
	move.b	(a4)+,d6	Pick up last row from fount
	bra.s	fax1i		Plot first pixel
*	
fax1h	add.b	d6,d6
*
fax1i	bpl.s	fax1j
*
	eor.b	d1,(a3)		Plot ink
	eor.b	d1,0(a3,d3.w)	Ink the row below that
*
fax1j	addq.l	#1,a3
*
fax1k	dbra	d5,fax1h
*
	bra	strLoop
*
*****************************************************
*
* Simple fallback OVER 0 for arbitrary widths 2+; generic
* and unoptimised, suitable for 2x4 or larger characters.
*
print0	move.w	sd.xinc(a0),d5
	subq.w	#1,d5		Adjust width for DBRA
	movea.w	d5,a5		Save fount row DBRA X count
	btst	#sa.tall,sd.cattr(a0)
	bne.s	fat0		Use double-height loops
*
	subq.w	#4,d7		Count top, under, last and DBRA
	cmp.w	#5,d7		Suppress top blank line?
	bgt.s	pTop		If not, wipe it and clamp D7
*
	addq.w	#1,d7		No blank line, use more fount
	bra.s	pFount
*
* YINC is at least 10 pixels so start with a row of STRIP
*
pTop	move.b	d0,(a3)+	Plot a pixel of strip
	dbra	d5,pTop		Blank a row of D5+1 pixels
*
	adda.w	d4,a3		Stride on to use fount data 
	moveq	#6,d7		Use full fount, but no more
*
* Process D7+1 lines of D5+1 pixels from the fount	 	
*
pLine	move.w	a5,d5
pFount	move.b	(a4)+,d6	Pick up a line of the fount
	bra.s	print0b
*
pLoop   add.b	d6,d6		Shift next pixel bit to N flag
print0b bpl.s	pStrip
*
	move.b	d1,(a3)+	Plot one pixel of ink
	bra.s	pNext
*
pStrip	move.b	d0,(a3)+	Plot a pixel of strip
pNext	dbra	d5,pLoop	Process row of D0+1 pixels
*
	adda.w	d4,a3		Stride to next character row
	dbra	d7,pLine  
*
* Penultimate line may be underlined or data from the fount
*
	move.w	a5,d5
	moveq	#-1,d6		All bits set for underline
	btst	#sa.under,sd.cattr(a0)
	bne.s	print0c		Underline overrides fount
*
	move.b	(a4)+,d6	Pick up row from fount
*
	bpl.s	print0f		Skip first blank pixel
*
	bra.s	print0e		Plot first pixel
*
print0c addq.l	#1,a4		Skip underlined row
	bra.s	print0e
*	
print0d	add.b	d6,d6
	bpl.s	print0f
*
print0e	move.b	d1,(a3)+	Plot ink
	bra.s	print0g
*
print0f	move.b	d0,(a3)+	Plot strip
print0g	dbra	d5,print0d
*
	move.w	a5,d5		Restore row count
	adda.w	d4,a3		Stride to next row
	move.b	(a4)+,d6	Pick up last row from fount
	bra.s	print0i		Plot first pixel
*	
print0h	add.b	d6,d6
print0i	bpl.s	print0j
*
	move.b	d1,(a3)+	Plot ink
	bra.s	print0k
*
print0j	move.b	d0,(a3)+	Plot strip
print0k	dbra	d5,print0h
*
	bra	strLoop
*
*****************************************************
*
* Tall character support, sifted out from print0 when
* the fat attribute bit is set. Custom YINCs of 5 rows
* - 10 pixel height - are not trapped, so use YINC 11
* to render 5-row truncated double-height when XINC is
* 6, otherwise the standard CSIZE 2,0 glyph is chosen. 
* Supports double-height characters for all widths and
* either setting of UNDER, using nested X and Y loops
* and doubling-up all the pixel writes to LINEL-1(A3). 
*
* Clobbers paper masks in D2.W and D3.W, pending output 
* routine pre-selection outside the main loop. This is
* not efficient but it should work. D3 caches LINEL and
* D2 is the new stride (one border and one screen line).
* Alternate rows are written by double-indexing which is
* expensive in time but cheap in code. As each character
* fills twice the usual space, it should be fast enough.
*
* This generic and unoptimsed version handles OVER 0 for 
* tall characters. It's suitable for 2x4 or larger glyphs.
*
fat0	move.w	sd.linel(a0),d3	Clobber D3 paper word
	move.w	d4,d2		Copy the default stride
	add.w	d3,d2		Stride over one extra row	
	lsr.w	#1,d7		Two screen rows per fount row
	subq.w	#4,d7		Count top, under, last and DBRA
*
	cmp.w	#5,d7		Suppress top blank line?
	bgt.s	fatTop		If not, draw blank and clamp D7
*
	addq.w	#1,d7		No blank line, use more fount
	bra.s	fatFont
*
* YINC is at least 10 pixels so start with two rows of STRIP
*
fatTop	move.b	d0,(a3)+	Plot a pixel of strip
	move.b	d0,-1(a3,d3.w)	Plot the row below that
	dbra	d5,fatTop	Blank a row of D5+1 pixels
*
	adda.w	d2,a3		Stride on to use fount data 
	moveq	#6,d7		Use full fount, but no more
*
* Process D7+1 lines of D5+1 pixels from the fount	 	
*
fatLine	move.w	a5,d5
fatFont	move.b	(a4)+,d6	Pick up a line of the fount
	bra.s	fat0b
*
fatLoop add.b	d6,d6		Shift next pixel bit to N flag
fat0b 	bpl.s	fatStp
*
	move.b	d1,(a3)+	Plot one pixel of ink
	move.b	d1,-1(a3,d3.w)	Also ink the row below that
	bra.s	fatNext
*
fatStp	move.b	d0,(a3)+	Plot a pixel of strip
	move.b	d0,-1(a3,d3.w)	And the row below that
fatNext	dbra	d5,fatLoop	Process row of D0+1 pixels
*
	adda.w	d2,a3		Stride to next character row
	dbra	d7,fatLine  
*
* Penultimate line may be underlined or fount data
*
	move.w	a5,d5
	moveq	#-1,d6		All bits set for underline
	btst	#sa.under,sd.cattr(a0)
	bne.s	fat0c		Underline overrides fount
*
	move.b	(a4)+,d6	Pick up row from fount
*
	bpl.s	fat0f		Skip first blank pixel
*
	bra.s	fat0e		Plot first pixel
*
fat0c   addq.l	#1,a4		Skip underlined row
	bra.s	fat0e
*	
fat0d	add.b	d6,d6
	bpl.s	fat0f
*
fat0e	move.b	d1,(a3)+	Plot ink
	move.b	d1,-1(a3,d3.w)	Ink the row below that

	bra.s	fat0g
*
fat0f	move.b	d0,(a3)+	Plot strip
	move.b	d0,-1(a3,d3.w)	Strip the row below that
fat0g	dbra	d5,fat0d
*
	move.w	a5,d5		Restore row count
	adda.w	d2,a3		Stride to next row
	move.b	(a4)+,d6	Pick up last row from fount
	bra.s	fat0i		Plot first pixel
*	
fat0h	add.b	d6,d6
fat0i	bpl.s	fat0j
*
	move.b	d1,(a3)+	Plot ink
	move.b	d1,-1(a3,d3.w)	Ink the row below that
	bra.s	fat0k
*
fat0j	move.b	d0,(a3)+	Plot strip
	move.b	d0,-1(a3,d3.w)

fat0k	dbra	d5,fat0h
*
	bra	strLoop
*
* To be added here, time permitting and if users show 
* sufficient enthusiasm, in decreasing order of likelihood: 
*
* Optimised unrolled 8,10-only output for OVER 0, UNDER 0;
* This will require 2560 bytes of code for 256 patterns!
*
* Unoptimised 16-colour narrow-character MODE 16 support,
* e.g. for CSIZE 0,0 and 1,0 in 16 colours or vertical 
* stipples.
*
* POINT, LINE and hence QL Turtle Graphics, by extending the
* DIY Toolkit DRAW commands for floating-point coordinates,
* integrating that code here, and extending the trap 3 table.
*
* SuperBASIC graphics FILL (for non-reentrant shapes only).
*
*
* INPUT, EDIT, ARC, CIRCLE, ELLIPSE etc. require substantial
* duplication or rewriting of unvectored Sinclair or Minerva
* routines, so those fall outside the scope of the Creative
* Commons licence of this project. Someone else may do them,
* or any of the others listed above, with Simon's blessing. 
*
*****************************************************
*
* Beginnings of a driver
*
* TRAP #3 support
*
output	cmp.b	#sd.fill+1,d0	Stop before float graphics
	bpl.s	awol		Not implemented	
*
	add.w	d0,d0		Form word index
	lea.l	trapTab,a4
	adda.w	(a4,d0.w),a4	Extract offset
	jmp	(a4)		Despatch
*
awol	moveq	#not_yet,d0	Ignore input keys 0..4
	rts
*
trapTab	dc.w	pend-trapTab    Key 0 
	dc.w	fbyte-trapTab
	dc.w	fline-trapTab	Ignore input keys
	dc.w	fstrg-trapTab
	dc.w	edlin-trapTab
	dc.w	sbyte-trapTab	Key 5 works 
	dc.w	awol-trapTab	Key 6 is undefined
	dc.w	sstrg-trapTab	Key 7 works
	dc.w	awol-trapTab	Key 8 is undefined
	dc.w	extop-trapTab
	dc.w	pxenq-trapTab
	dc.w	chenq-trapTab
	dc.w	border-trapTab
	dc.w	window-trapTab
	dc.w	cursor_on-trapTab
	dc.w	cursor_off-trapTab
	dc.w	position-trapTab
	dc.w	tab-trapTab
	dc.w	nl-trapTab
	dc.w	pcol-trapTab
	dc.w	ncol-trapTab
	dc.w	prow-trapTab
	dc.w	nrow-trapTab
	dc.w	pixPos-trapTab
	dc.w	scrol-trapTab
	dc.w	scrTop-trapTab
	dc.w	scrBot-trapTab
	dc.w	pan-trapTab	Key 27
	dc.w	awol-trapTab	Key 28 is not defined
	dc.w	awol-trapTab	Key 29 is not defined
	dc.w	panLine-trapTab
	dc.w	panEnd-trapTab
	dc.w	clear-trapTab	Key 32
	dc.w	clearTop-trapTab
	dc.w	clearBot-trapTab
	dc.w	clearLine-trapTab
	dc.w	clearEnd-trapTab
	dc.w	fount-trapTab
	dc.w	recol-trapTab	Key 38
	dc.w	paper-trapTab
	dc.w	strip-trapTab
	dc.w	ink-trapTab
	dc.w	flash-trapTab	Key 42 is only for MODE 8
	dc.w	under-trapTab
	dc.w	over-trapTab
	dc.w	csize-trapTab	
	dc.w	bblock-trapTab	Key 46, enough for now
*
*****************************************************
*
* SD.FOUNT key 37, validate maximum of 256 characters
* Use our own configurable defaults if parameter is 0 
*
fount	move.l	a1,d0
	beq.s	def1
*
checkf1
*
* The Minerva ROM second fount wraps from 255 to 0 to
* fill in control codes 0 to 31, so validation has been
* disabled in case anyone else decides to be that clever.
*
*	move.b	(a1),d0
*	add.b	1(a1),d0
*	bcs.s	bad		Reject total over 255
*
fount2	move.l	a2,d0
	beq.s	def2
*
checkf2	
*	move.b	(a2),d0		Allow fount overlaps
*	add.b	1(a2),d0
*	bcs.s	bad		Reject total over 255
*
fountx	move.l	a1,sd.fount0(a0)
	move.l	a2,sd.fount1(a0)
ok	moveq	#no_error,d0
	rts
*
def1	movea.l	sc.fount0(a3),a1 Restore SCN default
	bra.s	checkf2
*
def2	movea.l	sc.fount1(a3),a2
	bra.s	checkf1
*
* SD.SETSZ - key 45 - support CSIZE 2 or 3,0 or 1
*          treat CSIZE 0 or 1 as CSIZE 2 or 3 as MODE 8 does
*
* Handle height first; if height is valid but width is not,
* height still gets adjusted even if width is invalid. But if
* height is neither 0 nor 1, an error leaves width as before.  

csize	cmp.w	#4,d1		Allow widths 0..3
	bcc.s	bad
*
	lea.l	sd.cattr(a0),a1	Point at the stored state
	tst.w	d2
	beq.s	short		CSIZE ,0
*
	cmp.w	#1,d2		D2 should be preserved
	bne.s	bad		Reject CSIZE ,2..32767
*
tall	bset	#sa.tall,(a1)	
	move.w	#cHeight*2,sd.yinc(a0)
	bra.s	width
*
bad	moveq	#bad_parameter,d0
	rts
*
short	bclr	#sa.tall,(a1)	CSIZE ,0 or negative
	move.w	#cHeight,sd.yinc(a0)
*
width	moveq	#-3,d0		%1111 1111 1111 1101.w
	and.w	d1,d0		Ignore hires-only bit 1
	beq.s	narrow
*
wide	bset	#sa.fat,(a1)
	move.w	#8,sd.xinc(a0)
	bra.s	ok
*
narrow	bclr	#sa.fat,(a1)
	move.w	#6,sd.xinc(a0)
	bra.s	ok
*
* WINDOW checks coordinates fit within the screen limits,
* and always sets a border so it exits via the BORDER trap 
* routine with D1.B and D2.W from the initial parameters.
* The WINDOW command passes 128 (transparent) and width 0
* for the border, but other callers may set a real border.
*
window	move.w	d2,d0		Copy border width to D2.H
	swap	d2
	move.w	d0,d2
	move.l	sd.linel(a0),d0	Pick up maximum X and Y
	sub.l	d2,d0		Reduce both by border width
	move.w	(a1)+,d5	Window width in MODE 4 pixels
	lsr.w	#1,d5		Convert pixel count to bytes
	swap	d5		Move X byte count to high word
	move.w	(a1)+,d5	Get requested height in pixels 
	sub.l	d5,d0		Reduce maximum by window size
	move.w	(a1)+,d4	Get X origin in MODE 4 pixels
	lsr.w	#1,d4		Convert offset to bytes
	swap	d4		Move X offset in bytes to D4.H
	move.w	(a1),d4		Get requested Y origin 
	sub.l	d4,d0	
	bmi.s	outside		X total too large		
*
	tst.w	d0		Flag N if Y total excessive
	bmi.s	outside
*
* X offset and size in D4.H and D5.H count in bytes not pixels
*
	move.l	d4,sd.xmin(a0)	Update XORG and YORG together
	move.l	d5,sd.xsize(a0) Update XSIZE and YSIZE  	
	move.w	sd.borwd(a0),d6	Old border width, for reference
	bra	border2		Draw any border, homing cursor
*
*****************************************************
*
* Simple cursor-positioning, keys 16 to 23 - this is fiddly
* as we must preserve the old XPOS and YPOS in case the 
* potential new position in D0.L is outside the window.
*
* To minimise instruction overhead and register usage, both
* halves of D0 are used for word values (X high, Y in D0.W)
* and the combined X,Y record is passed into the checker.
* N.B. Two-axis cursor positioning must use and preserve D2.L.
*
* POSITION implements AT, the method that is most-often used.
* PIXPOS does pixel-precise positioning, used by CURSOR x,y.
*
position move.w	sd.xinc(a0),d0
	mulu	d1,d0		D0.W is now pixel XPOS
	move.w	d0,d1		Save XPOS in D1
	move.w	sd.yinc(a0),d0
	mulu	d2,d0		D0 is now pixel YPOS
	bmi.s	outside		This must be positive
*
	swap	d0
	move.w	d1,d0		D0.W is XPOS, D0.H is YPOS
	swap	d0		Put XPOS high and YPOS low
*
* Verify that updated XPOS and YPOS are inside the window,
* with room to draw at least one character at the position,
* then either update the channel or return a range error. 
*
checkN	bmi.s	outside		Reject if flagged negative 
*
	move.l	sd.xsize(a0),d1	Fetch both limits
	sub.l	sd.xinc(a0),d1	Allow one character margins
	cmp.w	d1,d0
	bhi.s	outside		No room for a character line
*
	swap	d0
	swap	d1
	cmp.w	d0,d1
	bcs.s	outside         No room for a character at X
*
	swap	d0
	move.l	d0,sd.xpos(a0)	Update XPOS and YPOS
exitOK	moveq	#no_error,d0	Validation succeeded
	rts
*
* Set pixel column to match character column number in D1
*
tab	move.w	sd.xinc(a0),d0
	mulu	d1,d0		Scale characters to pixels
	swap	d0		Potential new XPOS in D0.H
	move.w	sd.ypos(a0),d0	Merge in unchanged YPOS
	bra.s	testX
*
outside	moveq	#range_error,d0
	rts
*	
* Tentatively advance XPOS to the previous character column
*
pcol	move.l	sd.xpos(a0),d0
	move.w	d0,d1		Save YPOS which won't change
	swap	d0		Get XPOS into the low word
	sub.w	sd.xinc(a0),d0	Decrement XPOS to move left
	bra.s	newX
*
* Find next column similarly, but add rather than subtracting
*
ncol	move.l	sd.xpos(a0),d0
	move.w	d0,d1		Keep track of YPOS
	swap	d0
	add.w	sd.xinc(a0),d0	Move to next character column
*
newX	swap	d0
	move.w	d1,d0		Restore unchanged YPOS
testX	tst.l	d0
	bra.s	checkN
*
* Move to previous row if that's possible without scrolling
*
prow	move.l	sd.xpos(a0),d0
	sub.w	sd.yinc(a0),d0	Move up, YPOS is in D0.W
	bra.s	checkN
*
* Move to start of next line, if possible; this won't scroll
*
nl	moveq	#0,d0		Clear both X and Y words
	move.w	sd.ypos(a0),d0	Recover current YPOS
*
goDown	add.w	sd.yinc(a0),d0	Move to next character line
	bra.s	checkN		YPOS moved down, XPOS clear
*
* Attempt to move cursor down to the next row
*
nrow	move.l	sd.xpos(a0),d0	Fetch current X and Y
	bra.s	goDown
*
* Try to set cursor pixel coordinate to (D1.W, D2.W) 
*
pixPos	move.w	d1,d0		Fetch possible new pixel X
	bmi.s	outside		Not a good start!
*
	lsr.w	#1,d0	 	MODE 4 pixels in X to bytes
	swap	d0		Move X to high word
	move.w	d2,d0		Merge Y into low word
	bra.s	checkN
*
*****************************************************
*
* Clear screen methods, starting with the full CLS
*
clear	move.w	sd.ysize(a0),d1
	moveq	#0,d0
	move.l	d0,sd.xpos(a0)	Observed, undocumented
*
* Entry for full-width partial window clearing, not CLS 4
*
clearT	move.w	sd.xsize(a0),d2
clearA	add.w	sd.ymin(a0),d0
	movea.l	sd.scrb(a0),a1
*
* Enter with first pixel line number 0..255    in D0.W
* The number of pixel lines to clear           in D1.W
* The number of pixels to be cleared per line  in D2.W
* First pixel address in A1.L (before XMIN screen margin) 
*
doClear	subq.w	#1,d1		DBRA adjustment
	bmi.s	outside
*
	moveq	#0,d4		Form stride over border
	move.w	sd.linel(a0),d4
	adda.w	sd.xmin(a0),a1	Apply window X margin
	mulu	d4,d0
	adda.l	d0,a1		Apply Y offset
	sub.w	d2,d4		Form stride in D4
	subq.w	#1,d2		Adjust width for DBRA
	bmi.s	outside
*
	move.w	d1,d3		Make room for PAPER
	move.b	sd.pcolr(a0),d1
	bra	blok		Return ERROK via BLOK		
*
* CLS variants, 1 (top), 2 (bottom), 3 (line), 4 (end) 
*
clearTop moveq	#0,d0		Top line in window
	move.w	sd.ypos(a0),d1	D1 is line count
	bne.s	clearT
*
nowt	moveq	#no_error,d0
	rts
*
* CLS 2, clear bottom clears only below the cursor line
*
clearBot move.w	sd.ypos(a0),d0	Offset to current line
	add.w	sd.yinc(a0),d0	Offset to following line
	move.w	sd.ysize(a0),d1
	sub.w	d0,d1		D1 is line count
	bne.s	clearT
*
	bra.s	nowt
*
* Entry point for degenerate PANLN or PANRT
*
clearSome cmp.b	#sd.panrt*2,d0
	beq.s	clearEnd
*
* CLS 3 assumes there is always a line to be cleared
*
clearLine move.w sd.ypos(a0),d0	Offset to current line
	move.w	sd.yinc(a0),d1  D1 is line count
	bra.s	clearT
*
* CLS 4, clear only to the end of the current line
*
clearEnd move.w	sd.xpos(a0),d0
	move.w	sd.xsize(a0),d2
	sub.w	d0,d2		Work out column count
	bls.s	nowt		Nothing to do
*
	move.l	sd.scrb(a0),a1
	adda.w	sd.xpos(a0),a1	Adjust for window column
	move.w	sd.ymin(a0),d0
	add.w	sd.ypos(a0),d0	Adjust for window line
	move.w	sd.yinc(a0),d1	One character line
	bra.s	doClear
*
*****************************************************
*
* The three versions of SCROLL differ only in the number of
* lines they move and the address of the first of the lines,
* so the same routine handles all three keys with adjustments
* for the TRAP key value in D0. It seems the cursor position 
* is not changed, in any case. After scrolling we must wipe
* ABS(D1) lines that pixels have been scrolled out of. If the
* scroll count equals or exceeds the window height, just wipe
* the window with no error.
*
scrol
scrTop
scrBot	move.w	sd.ysize(a0),d5	Maximum height to scroll
	move.w	sd.ymin(a0),d6	Minimum vertical offset	
	movea.l	sd.scrb(a0),a3	Find left margin of window
	adda.w	sd.xmin(a0),a3
	move.w	d1,d7
	beq.s	nowt		Leave window unchanged
*
	bpl.s	down
*
	neg.w	d7		Make pixel count positive
*
* D7 counts the lines affected, D1 is still the signed scroll 
* distance in pixels, D5 is the full height of the window and
* D6 is the number of lines from screen top to top of window.
* Specialise D5 and D6 to suit the chosen part of the window.
*
down	cmp.b	#sd.scrtp*2,d0	*2 because of lsl #1 in despatch
	bmi.s	rollIt		SD.SCROL < SD.SCRTP, scroll all
*
	beq.s	shorten		Scroll above the cursor line
*
	move.w	sd.ypos(a0),d2	SD.SCRBT, find the bottom part
	add.w	sd.yinc(a0),d2	Start below the current line
	sub.w	d2,d5		Adjust to find remaining height
	bls.s	nowt		There is no bottom part
*
* Now D5 counts at least one pixel line in the bottom part. 
* Adjust the starting pixel line in D6 to suit.
*	
	add.w	d2,d6		SHOULD always be inside window!
	bra.s	rollIt
*
shorten	sub.w	sd.ypos(a0),d5	Adjust height, pixel line count
	bls.s	nowt		Nothing above the cursor line
*
rollIt	move.w	sd.linel(a0),d2
	mulu	d2,d6		Convert top offset to bytes
	adda.l	d6,a3		A3 -> top left of active area
	sub.w	d7,d5		Don't count cleared lines
	bls.s	blanker
*
	subq.w	#1,d5		DBRA count of lines to scroll
	tst.w	d1
	bmi.s	rollUp
*
* Scroll down D1 pixels by copying into the bottom of the
* active area from D1 lines above, from bottom to top.
* Point A4 at the start of the last line of the window
* and A3 D1 lines above that. 
*
	movea.l	a3,a1		Wipe here after scrolling
	moveq	#0,d4
	sub.w	sd.xsize(a0),d4
	sub.w	d2,d4		Stride up is -LineLen-xsize
	move.w	d5,d6
	mulu	d2,d6		
	adda.w	d6,a3
	movea.l	a3,a4
	mulu	d1,d2		Source to target byte offset
	suba.l	d2,a4	
	move.w	sd.xsize(a0),d2	Set up width for wiping later
	move.w	d2,d3
	subq.w	#1,d3
*
copyDnY	move.w	d3,d0		Copy X count to volatile D0
copyDnX	move.b	(a4)+,(a3)+
	dbra	d0,copyDnX	Move a line of pixels up
*
	adda.w	d4,a3
	adda.w	d4,a4
	dbra	d5,copyDnY	Scroll all the lines up	
*
	move.w	d1,d3		Number of lines to wipe
	move.w	sd.linel(a0),d4	A1 points where to wipe
	sub.w	d2,d4		Form stride for BLOK
	bra.s	blokOut
*
* Attempt to scroll more lines than in the window or area;
* Simply clear it to paper colour without moving anything.
*
blanker	add.w	d7,d5		Restore height
	move.w	d5,d3		Line count for BLOK
rolled	move.w	sd.xsize(a0),d2	Column count
	move.w	sd.linel(a0),d4
	sub.w	d2,d4		Stride between lines	
	bra.s	clearUp		A3 -> top left of window
*
rollUp	neg.w	d1		Form positive line-count
	move.w	d2,d4 		Default stride is lineLength
	mulu	d1,d2		Target to source byte offset
	lea.l	0(a3,d2.l),a4	Copy up from (a4) to (a3)
	move.w	sd.xsize(a0),d2	Set up width for wiping later
	sub.w	d2,d4		Allow for auto-incrementation
	move.w	d2,d3		Scroll the same pixel-width
	subq.w	#1,d3		Adjust width count for DBRA
	subq.w	#1,d6		Adjust line count for DBRA
*
copyUpY	move.w	d3,d0		Copy X count to volatile D0
copyUpX	move.b	(a4)+,(a3)+
	dbra	d0,copyUpX	Move a line of pixels up
*
	adda.w	d4,a3
	adda.w	d4,a4
	dbra	d5,copyUpY	Scroll all the lines up	
*
	move.w	d1,d3		Number of lines to wipe
*
clearUp	movea.l	a3,a1		Start of area to be wiped
blokOut	subq.w	#1,d2		DBRA X count for BLOK
	subq.w	#1,d3		Prepare Y count for DBRA
	move.b	sd.pcolr(a0),d1 
	bra	blok		Return with no error in D0
*
* Pan moves the whole window contents within the border D1  
* HiRes pixels left (if negative) or right (positive), 
* filling ABS(D1) columns with paper. If ABS(D1) is equal 
* to or exceeds window width, wipe the window with no error 
* message, leaving the border unchanged. To benefit from
* future optimisations, wiping uses CLS or BLOCK methods.
*           
panLine
panEnd
pan	move.w	sd.xsize(a0),d2 Preload useful values
	movea.l	sd.scrb(a0),a3  Point A3 into screen
	adda.w	sd.xmin(a0),a3	Skip over left margin
	move.w	sd.ymin(a0),d6	Top margin in lines
	move.w	sd.linel(a0),d4 LINEL will also be useful later
	mulu	d4,d6		Top margin in bytes
	adda.l	d6,a3		A3 -> first byte to clobber
	asr.w	#1,d1		Convert MODE 4 pixels to bytes
	move.w	d1,d7
	beq	nowt		Leave window unchanged
*
* Merge left and right to share more common code, sift later
*
	bpl.s	goPos
*
	neg.w	d7		Make pixel count positive
*
* Sift SD.PANLN from SD.PAN; SD.PANRT is a subset of SD.PANLN
* so preserve the TRAP key in D0 to adjust for that, later.
*
goPos	cmp.b	#sd.pan*2,d0	Is this a full-screen pan?
	bne.s	linel		No, pan just a single text line
*
	cmp.w	d2,d7           Will any pixels get panned?
	bcc	clear		Degenerate case, clear window
*
	move.w	sd.ysize(a0),d5	Pan all lines within the window 
	bra.s	heightl
*
* Pan only the cursor line; perhaps only the right end of it
*
linel	cmp.w	d2,d7		Pan distance >= window width?
	bcc	clearSome	Clear end of or entire line
*
	move.w	sd.yinc(a0),d5	Pick up line height in pixels
	move.w	sd.ypos(a0),d6
	mulu	d4,d6		Byte offset, LINEL * YPOS 
	adda.l	d6,a3		Point at top of line not window
*
* More specialisation and checking for the PANRT case
*	
	cmp.b	#sd.panrt*2,d0
	bne.s	heightl
*
	move.w	sd.xpos(a0),d6
	sub.w	d6,d2		Reduce pixel count in D2
	adda.w	d6,a3		Advance start address right
*
heightl	move.w	d5,d3		Save height for CLEAR later
	subq.w	#1,d5		Make adjustment for DBRA
	sub.w	d7,d2		D2 counts pixels to be moved
	bls.s	noPan
*
* From now on we must specialise left and right cases
*
	tst.w	d1
	bpl.s	goRight
*
* Handle panning leftwards (distance in D1 is negative)
*
* This is like panning right except that the part cleared is
* on the right and the target address is before the source;
* hence incrementing of source and target addresses is safe.
*
*	cmp.b	#sd.panrt*2,d0	*2 matches LSL #1 in despatch
*	bne.s	full		Do the full width
*
* Reduce the width to move by XPOS and advance right,
* unless XPOS is at or past the end of the line in 
* which case there's nothing to move or clear. ????
*	
full	sub.w	d2,d4		Stride is LINEL-count
*
part	lea.l	0(a3,d7.w),a4	Point A4 at source
	lea.l	0(a3,d2.w),a1	We will wipe here later
	subq.w	#1,d2		Form X count for DBRA
*
doLeft	move.w	d2,d6
pxLeft	move.b	(a4)+,(a3)+	Another 68010-ready loop
	dbra	d6,pxLeft
*
	adda.w	d4,a3		Stride to next line
	adda.w	d4,a4
	dbra	d5,doLeft
*
* Specialisation for PANRT in case cursor is far to the right
*
noPan	cmp.b	#sd.panrt*2,d0
	bne.s	wipeOut
*
noPan2	move.w	sd.xsize(a0),d6
	sub.w	sd.xpos(a0),d6	D6 counts line space remaining
	cmp.w	d7,d6		Enough to clear?
	bcc.s	wipeLin
*
	move.w	d6,d7
	bls	nowt		
	bra.s	wipeLin
*
* Clear the space vacated, using the CLS method epilogue;
* D1.W already contains the required block height in pixels
* and A1 points to the top left corner of the block to clear.
*
wipeOut	moveq	#0,d0		
	move.w	d0,sd.xpos(a0)	QDOS pan zeroes X but not Y
* 
wipeLin	move.w	d7,d2		Width to clear, in pixels
	move.w	sd.linel(a0),d4	Block stride probably differs
	sub.w	d2,d4		Form block stride in bytes	
	bra	blokOut		A1 -> top left, Y count in D3
*
* Handle panning rightwards
*
goRight	movea.l	a3,a1		Save start for later wipe
	adda.w	d2,a3		Offset right to copy backwards
	add.w	d2,d4		Hence stride is LINEL + X count
	lea.l	0(a3,d7.w),a4	Point source beyond the target
	subq.w	#1,d2		Form X count for DBRA
*
* To avoid overwriting its input, unlike moving left this must
* copy backwards in memory; hence later start, greater stride.
*
doRight	move.w	d2,d6
pxRight	move.b	-(a3),-(a4)
	dbra	d6,pxRight
*
	adda.w	d4,a3		Stride to next line
	adda.w	d4,a4
	dbra	d5,doRight
*
	bra.s	noPan		Clear vacated space
*
*****************************************************
*
* tested fount
* tested paper
* tested strip
* tested ink
* tested under
* tested over
* tested csize
*
* To be implemented via a 256-byte table passed into 
* the driver via the new SuperBASIC command RECOL256
* as the standard command only passes in eight bytes
* and only supports colours 0 to 7. RECOL256 can be
* used in MODE16 with a 16-entry table, but then it
* recolours pairs of pixels (allowing even stipples
* to be swapped!).  
*
* Point A2 at the top left pixel of window
*
recol	move.w	sd.ymin(a0),d0
	move.w	sd.linel(a0),d4
	mulu	d4,d0		Find initial pixel line
	movea.l	sd.scrb(a0),a2
	adda.l	d0,a2
	adda.w	sd.xmin(a0),a2	Adjust for initial column
*
* Set up counters for X and Y window dimensions in D0 and D1
*
	move.w	sd.xsize(a0),d0	
	sub.w	d0,d4		D4 is stride to next line 
	subq.w	#1,d0		Form column count for DBRA
	bmi.s	noError         No columns means no effort
*
	move.w	sd.ysize(a0),d1
	subq.w	#1,d1		Adjust line count for DBRA
	bmi.s	noError		No lines? No problem :-)
*
	moveq	#0,d2		Zero-extend byte indices
*
*   For each line of the window
*     For each pixel of the line
*       Replace pixel with corresponding table entry
*
recoly	move.w	d0,d3		Refresh X count for next line
recolx	move.b	(a2),d2		Read a pixel
	move.b	0(a1,d2.w),(a2)+ Remap via table
	dbra	d3,recolx
*
	adda.w	d4,a2		Stride over any window border
	dbra	d1,recoly	Process all the rows
*
noError	moveq	#no_error,d0
	rts
*
* Not supported by the new hardware:
*
flash
*
* Not to be implemented - input-only calls
*
cursor_on
cursor_off
pend	
fbyte
fline
fstrg
edlin	moveq	#not_yet,d0		
	rts
*
* sbyte	and sstrg are implemented above
*
extop	jmp	(a2)
*
*****************************************************
*
* Query methods that populate a table pointed to by A1
*
pxenq	move.l	sd.xsize(a0),d0	X and Y together
	swap	d0		Adjust top word
	add.w	d0,d0		MODE 4 pixels in X
	swap	d0
	move.l	d0,(a1)+
	move.l	sd.xpos(a0),d0
	swap	d0
	add.w	d0,d0		MODE 4 units
	swap	d0
	move.l	d0,(a1)
	subq.l	#4,a1		FIX: A1 is not "undefined"	
	moveq	#no_error,d0
	rts
*
* CHENQ could be made faster on TG68K if the slow divisions
* by 6 and 10 commonly invoked by PRINT separator handling
* were trapped and strength-reduced to fast multiplications
* and implied divisions by 65536 with SWAP, though this will
* limit the maximum supported screen dimensions to 65535
* divided by the divisor to be optimised.  
*
chenq	move.w	sd.xinc(a0),d0	Avoid division by zero
	beq.s	ovaflo
*
	moveq	#0,d4		Zero extend for division

	move.w	sd.xsize(a0),d4
	divu	d0,d4		Xinc and Xsize are congruent
	move.w	d4,(a1)+	Store X size, in characters
	moveq	#0,d5
 	tst.b	sd.nlsta(a0)	Reconcile PRINT TO and LIST?
	bne.s	willBe0	
*
	move.w	sd.xpos(a0),d5
	divu	d0,d5
*
willBe0	move.w	sd.yinc(a0),d6	Validate character height
	beq.s	ovaflo
*
	moveq	#0,d4
	move.w	sd.ysize(a0),d4
	divu	d6,d4
	move.w	d4,(a1)+	Store Y size, in characters
	move.w	d5,(a1)+	Store X position
	moveq	#0,d5
	move.w	sd.ypos(a0),d5
	divu	d6,d5
	move.w	d5,(a1)		Store Y position
	subq.l	#6,a1		FIX: callers expect old A1
	bra.s	noError
*
ovaflo	moveq	#overflow,d0
	rts
*
*****************************************************
*
* Draw a D2.W pixel border around the current window using
* the colour in D1.B. Also used by the SD.WDEF API. BEWARE
* - this may leave the window with nothing but a border!
* This routine could probably be simplified, but it works. 
*
border	move.w	sd.borwd(a0),d6	Check old border width
	cmp.w	d6,d2
	beq.s	noHome		No change, leave cursor alone
*
border2	clr.l	sd.xpos(a0)	Home cursor (reads redundant)
*
noHome	move.w	sd.xsize(a0),d3	Get width with old border in D3
	add.w	d6,d3
	add.w	d6,d3		Restore full window both sides
	sub.w	d2,d3
	sub.w	d2,d3		Adjust for new size both sides
	bmi.s	oRange
*
	move.w	sd.ysize(a0),d5
	add.w	d6,d5
	add.w	d6,d5		Height without border is in D5
	sub.w	d2,d5
	sub.w	d2,d5		D5 is height within borders
	bpl.s	itFits
*
oRange	moveq	#range_error,d0
	rts
*
* Border fits, does it leave room for at least one character?
*
itFits	cmp.w	sd.yinc(a0),d5
	bcs.s	oRange
*
	cmp.w	sd.xinc(a0),d3
	bcs.s	oRange
*
	move.w	d2,sd.borwd(a0)
	move.w	d2,a5		Cache new border width in A5
	move.w	d3,sd.xsize(a0)
	move.w	d5,sd.ysize(a0)
	move.b	d1,sd.bcolr(a0)	Store the new border colour
	cmp.b	#128,d1		Transparent
	beq.s	enuf
*
	tst.w	d2		Does anything need drawing?
	beq.s	enuf
*
	movea.l	a1,a3		Save A1 for return to QDOS
*
* Draw top and bottom borders, full window width, with BLOCKs
* BLOK needs the X and Y counts adjusted for DBRA in D2 and D3;
* A1 starts pointing at the top left corner, stride is in D4.L.
*
* A1 advances to the line after the last line of the block; D0
* and D3 are -1 after the DBRAs, but D2 and D4 are unchanged.
*
* The remaining BORDER code reads SD.BORWD four times though it 
* starts out in D2 (which BLOK clobbers) so we cache that in A5.
* SD.LINEL is used four times, twice in MULs, so it's cached in
* D7 to reduce the need to refetch it or copy between registers. 
*
	move.w	sd.linel(a0),d7	Cache LINEL in D7 for speed
	movea.l	sd.scrb(a0),a1	Find top left of screen
	move.w	sd.xmin(a0),d0
	sub.w	d6,d0		Discount old left border
	adda.w	d0,a1		Point A1 at left border
	movea.l	a1,a4		Save offset for the sides
	move.w	sd.ymin(a0),d0
	sub.w	d6,d0		Discount old top border
	adda.w	d0,a1		A1 -> top left of window
	add.w	d2,d2		Width of both borders
	add.w	d3,d2		Include interior width
	move.w	a5,d3		height of block is BORWD
	subq.w	#1,d3		Adjust height for DBRA
	moveq	#0,d4		Form long stride
	move.w	d7,d4		Start from line length
	sub.w	d2,d4		Adjust stride for bar width
	subq.w	#1,d2		Adjust X count for DBRA           
	bsr	blok		Draw top border, full width
*
* Now draw the bottom border; A1 points to the end of the top
* block and must be advanced to the top of the bottom block.
*
	move.w	d7,d0 		Form the offset to the bottom
	mulu	d5,d0		D5 is the internal line count
	adda.l	d0,a1		Skip the window's interior
	move.w  a5,d3		Recover height of new border
	subq.w	#1,d3		Adjust height for DBRA
	bsr	blok		Draw full width bottom border
*
* The left and right borders only need to be drawn below the
* top border and above the bottom block, though QDOS redraws
* all four corners needlessly.
*
* Point A1 at the left edge of the window before its border,
* skipping old YMIN-oldBorder lines, using old X offset in A4
*
	move.w	sd.ymin(a0),d0
	sub.w	d6,d0		D0 is YMIN with no border
	add.w	a5,d0		D0 is YMIN with new border
	mulu	d7,d0		Skip over the top YMIN lines
	adda.l	d0,a4		Form block start address 
	movea.l	a4,a1		A1 -> top left of left block	
*
* Fill in the left block, new YSIZE tall and borderWidth wide  
*
	subq.w	#1,d5		Adjust YSIZE for DBRA
	move.w	d5,d3		Present volatile Y count
	move.w	d7,d4		Compute stride for either side
	move.w  a5,d2	
	sub.w	d2,d4
	subq.w	#1,d2		DBRA adjustment, X count
	bsr	blok		Fill in the left border
*
* Repoint A1 at the right border, otherwise like the left one
* So D2 and D3 (in D5) for the left border can be re-used
*
	movea.l	a4,a1		Retrieve left border address
	adda.w	sd.xsize(a0),a1 Stride over to the right block
	adda.l	a5,a1		Add sign-extended border width
	move.w	d5,d3		Recover vertical pixel count
	bsr	blok		Fill in the right border
*
	movea.l	a3,a1		Restore QDOS's input value
*
* N.B. BORDER changes the window MINs and SIZEs so we need to
* offset the previous border in D6 before adding the new one. 
*
enuf	sub.w	sd.borwd(a0),d6	Old width-new width
	sub.w	d6,sd.xmin(a0)	Remove old and add new
	sub.w	d6,sd.ymin(a0)
	bra.s	finit
*
*****************************************************
*
* Simple commands: UNDER, OVER, STRIP, INK, PAPER
*
under	tst.b	d1
	bne.s	uset
*
	bclr	#sa.under,sd.cattr(a0)
	bra.s	finit	
*
uset	bset	#sa.under,sd.cattr(a0)
	bra.s	finit
*
over	tst.b	d1
	beq.s	oClear
*
	addq.b	#1,d1
	beq.s	oXor	
*
	subq.w	#2,d1
	bne.s	oBad
*
	bset	#sa.trans,sd.cattr(a0)
	bclr	#sa.xor,sd.cattr(a0)
finit	moveq	#no_error,d0
	rts
*
oXor	bset	#sa.xor,sd.cattr(a0)
	bra.s	oTrans	
*
oBad	moveq	#bad_parameter,d0
	rts
*
oClear	bclr	#sa.xor,sd.cattr(a0)
oTrans	bclr	#sa.trans,sd.cattr(a0)
	bra.s	finit
*
* INK and STRIP values are also copied to word masks
* to speed up character output in optimised sizes.
*
ink	move.b	d1,sd.icolr(a0)
	move.b	d1,ink0(a0)
	move.b	d1,ink1(a0)
        move.b	d1,ink2(a0)
	move.b	d1,ink3(a0)
	bra.s	finit
*
strip	move.b	d1,sd.scolr(a0)
	move.b	d1,strip0(a0)
	move.b	d1,strip1(a0)
        move.b	d1,strip2(a0)
	move.b	d1,strip3(a0)
	bra.s	finit
*
paper	move.b	d1,sd.pcolr(a0)
	bra.s	finit
*
* BLOCK, SCROLL and CLS routines
*
papers	move.b	sd.pcolr(a0),d0	Make long word of paper
	move.b	d0,d1
	lsl.l	#8,d1		D1.W = P 0
	or.b	d0,d1		D1.W = P P
	move.w	d1,d0
	swap	d0		D0.L = P P ? ?
	move.w	d1,d0		D0.L = P P P P
*
* TO BE CONTINUED if CLS and SCROLL get word-optimised
*
*****************************************************
*
* Bytewise BLOCK, SD.FILL implementation, D1.B is ink
*
bblock	moveq	#MAKEVEN,d2
	and.w	(a1)+,d2	Width in HiRes pixels
	beq.s	done
*
	move.w	(a1)+,d3	Height in pixels 0..256
	beq.s	done
*
	move.w	(a1)+,d0	X start, 0..511
*
*  Check Xstart+Width <= Xsize, in MODE 4 pixels
*
	move.w	sd.xsize(a0),d4
	add.w	d4,d4		Count MODE 4 units
	sub.w	d0,d4		Allow for X start
	sub.w	d2,d4		Allow for width
	bmi.s	tooBig
*
	lsr.w	#1,d0		X offset in bytes
	add.w	sd.xmin(a0),d0	X offset in window
        move.w  d0,d4
	move.w	(a1)+,d0	Y offset in lines
	move.l	sd.scrb(a0),a1
	adda.w	d4,a1           A1 -> X column		
*
*  Check that Ystart+Height <= Ysize
*
	move.w	sd.ysize(a0),d4
	sub.w	d0,d4		Allow for Y offset
	sub.w	d3,d4		Allow for height
	bmi.s	tooBig
*
	moveq	#0,d4		Avoid extending later
	move.w	sd.linel(a0),d4 D4.L is line length
	add.w	sd.ymin(a0),d0	Adjust Y for window
	mulu.w	d4,d0		Y offset in bytes
	adda.l	d0,a1		A1 -> top left pixel	
	lsr.w	#1,d2		Width in LowRes pixels
	sub.w	d2,d4		D4 is stride between rows
	subq.w	#1,d3		Prepare for DBRA Y count
	subq.w	#1,d2		Prepare for DBRA X count
	beq.s	b1pix		Specialise 1-pixel column
*
	btst	#sa.xor,sd.cattr(a0)
	bne.s	bloxor		OVER -1, Exclusive OR
*
blok	move.w	d2,d0		Restore X count
bline	move.b	d1,(a1)+	68010-friendly loop
	dbra	d0,bline
	adda.w	d4,a1	
	dbra	d3,blok
done	moveq	#no_error,d0	Nothing left to draw
	rts
*
bloxor	move.w	d2,d0		Restore X count
blox	eor.b	d1,(a1)+	68010-friendly loop
	dbra	d0,blox
	adda.w	d4,a1	
	dbra	d3,bloxor
	bra.s	done
*
* Optimised one-pixel vertical line cases avoid nested loops
*
b1pix	btst	#sa.xor,sd.cattr(a0)
	beq.s	b1plot		OVER 0
*	
* OVER -1, Exclusive OR, 1 pixel vertical line
*
b1xor	eor.b	d1,(a1)+
	adda.w	d4,a1		Stride to next line
	dbra	d3,b1xor
	bra.s	done
*
* OVER 0, 1-pixel vertical line
*
b1plot	move.b	d1,(a1)+
	adda.w	d4,a1		Stride to next line
	dbra	d3,b1plot
	bra.s	done
*
tooBig	moveq	#range_error,d0
	rts
*
*****************************************************
*
* Each of these eight-byte routines outputs three pixel pairs 
* corresponding to a binary fount pattern %000000 thro' %111111
*
* CSIZE 2,0 OVER 0 implementation
*
csize0x move.w  d0,(a3)+	Pattern %000000, 6 strip bytes
        move.w  d0,(a3)+
        move.w  d0,(a3)+
        jmp     (a5)            0
*	
        move.w  d0,(a3)+        Pattern %000001
        move.w  d0,(a3)+
        move.w  d1,(a3)+        5 strip, 1 ink pixel
        jmp     (a5)		1
*	
        move.w  d0,(a3)+        Pattern %000010
        move.w  d0,(a3)+
        move.w  d2,(a3)+
        jmp     (a5)		2
*	
        move.w  d0,(a3)+        Pattern %000011
        move.w  d0,(a3)+
        move.w  d3,(a3)+        4 strip, 2 ink pixels
        jmp     (a5)		3
*
        move.w  d0,(a3)+	Pattern %000100
        move.w  d1,(a3)+
        move.w  d0,(a3)+
        jmp     (a5)		4
*	
        move.w  d0,(a3)+        Pattern %000101
        move.w  d1,(a3)+
        move.w  d1,(a3)+
        jmp     (a5)		5
*	
        move.w  d0,(a3)+        Pattern %000110
        move.w  d1,(a3)+
        move.w  d2,(a3)+
        jmp     (a5)		6
*	
        move.w  d0,(a3)+        Pattern %000111
        move.w  d1,(a3)+
        move.w  d3,(a3)+
        jmp     (a5)		7
*	
        move.w  d0,(a3)+        Pattern %001000
        move.w  d2,(a3)+
        move.w  d0,(a3)+
        jmp     (a5)		8
*	
        move.w  d0,(a3)+        Pattern %001001
        move.w  d2,(a3)+
        move.w  d1,(a3)+
        jmp     (a5)		9
*	
        move.w  d0,(a3)+        Pattern %001010
        move.w  d2,(a3)+
        move.w  d2,(a3)+
        jmp     (a5)		10
*	

        move.w  d0,(a3)+        Pattern %001011
        move.w  d1,(a3)+
        move.w  d3,(a3)+
        jmp     (a5)		11
*
        move.w  d0,(a3)+	Pattern %001100
        move.w  d3,(a3)+
        move.w  d0,(a3)+
        jmp     (a5)            12
*	
        move.w  d0,(a3)+        Pattern %001101
        move.w  d3,(a3)+
        move.w  d1,(a3)+
        jmp     (a5)            13
*	
        move.w  d0,(a3)+        Pattern %001110
        move.w  d3,(a3)+
        move.w  d2,(a3)+
        jmp     (a5)            14
*	
        move.w  d0,(a3)+        Pattern %001111
        move.w  d3,(a3)+
        move.w  d3,(a3)+
        jmp     (a5)            15
*
        move.w  d1,(a3)+	Pattern %010000
        move.w  d0,(a3)+
        move.w  d0,(a3)+
        jmp     (a5)		16
*	
        move.w  d1,(a3)+        Pattern %010001
        move.w  d0,(a3)+
        move.w  d1,(a3)+
        jmp     (a5)		17
*	
        move.w  d1,(a3)+        Pattern %010010
        move.w  d0,(a3)+
        move.w  d2,(a3)+
        jmp     (a5)		18
*	
        move.w  d1,(a3)+        Pattern %010011
        move.w  d0,(a3)+
        move.w  d3,(a3)+
        jmp     (a5)		19
*
        move.w  d1,(a3)+	Pattern %010100
        move.w  d1,(a3)+
        move.w  d0,(a3)+
        jmp     (a5)		20
*	
        move.w  d1,(a3)+        Pattern %010101
        move.w  d1,(a3)+
        move.w  d1,(a3)+
        jmp     (a5)		21
*	
        move.w  d1,(a3)+        Pattern %010110
        move.w  d1,(a3)+
        move.w  d2,(a3)+
        jmp     (a5)		22
*	
        move.w  d1,(a3)+        Pattern %010111
        move.w  d1,(a3)+
        move.w  d3,(a3)+
        jmp     (a5)		23
*
        move.w  d1,(a3)+	Pattern %011000
        move.w  d2,(a3)+
        move.w  d0,(a3)+
        jmp     (a5)		24
*	
        move.w  d1,(a3)+        Pattern %011001
        move.w  d2,(a3)+
        move.w  d1,(a3)+
        jmp     (a5)		25
*	
        move.w  d1,(a3)+        Pattern %011010
        move.w  d2,(a3)+
        move.w  d2,(a3)+	
        jmp     (a5)		26
*	
        move.w  d1,(a3)+        Pattern %011011
        move.w  d2,(a3)+
        move.w  d3,(a3)+
        jmp     (a5)		27
*
        move.w  d1,(a3)+	Pattern %011100
        move.w  d3,(a3)+
        move.w  d0,(a3)+
        jmp     (a5)		28
*	
        move.w  d1,(a3)+        Pattern %011101
        move.w  d3,(a3)+
        move.w  d1,(a3)+
        jmp     (a5)		29
*	
        move.w  d1,(a3)+        Pattern %011110
        move.w  d3,(a3)+
        move.w  d2,(a3)+
        jmp     (a5)		30
*	
        move.w  d1,(a3)+        Pattern %011111
        move.w  d3,(a3)+
        move.w  d3,(a3)+
        jmp     (a5)		31
*
* Second half, top bit of pattern set, leftmost pixel INK
*
        move.w  d2,(a3)+	Pattern %100000
        move.w  d0,(a3)+
        move.w  d0,(a3)+
        jmp     (a5)		32
*	
        move.w  d2,(a3)+	Pattern %100001
        move.w  d0,(a3)+
        move.w  d1,(a3)+
        jmp     (a5)		33
*	
        move.w  d2,(a3)+	Pattern %100010
        move.w  d0,(a3)+
        move.w  d2,(a3)+
        jmp     (a5)		34
*	
        move.w  d2,(a3)+	Pattern %100011
        move.w  d0,(a3)+
        move.w  d3,(a3)+
        jmp     (a5)		35
*
        move.w  d2,(a3)+	Pattern %100100
        move.w  d1,(a3)+
        move.w  d0,(a3)+
        jmp     (a5)		36
*	
        move.w  d2,(a3)+	Pattern %100101
        move.w  d1,(a3)+
        move.w  d1,(a3)+
        jmp     (a5)		37
*	
        move.w  d2,(a3)+        Pattern %100110
        move.w  d1,(a3)+
        move.w  d2,(a3)+
        jmp     (a5)		38
*		
        move.w  d2,(a3)+        Pattern %100111
        move.w  d1,(a3)+
        move.w  d3,(a3)+
        jmp     (a5)		39
*	
        move.w  d2,(a3)+        Pattern %101000
        move.w  d2,(a3)+
        move.w  d0,(a3)+
        jmp     (a5)		40
*	
        move.w  d2,(a3)+        Pattern %101001
        move.w  d2,(a3)+
        move.w  d1,(a3)+
        jmp     (a5)		41
*	
        move.w  d2,(a3)+        Pattern %101010
        move.w  d2,(a3)+
        move.w  d2,(a3)+
        jmp     (a5)		42
*	
        move.w  d2,(a3)+        Pattern %101011
        move.w  d1,(a3)+
        move.w  d3,(a3)+
        jmp     (a5)		43		
*
        move.w  d2,(a3)+	Pattern %101100
        move.w  d3,(a3)+
        move.w  d0,(a3)+	44
        jmp     (a5)
*	
        move.w  d2,(a3)+        Pattern %101101
        move.w  d3,(a3)+
        move.w  d1,(a3)+
        jmp     (a5)		45
*	
        move.w  d2,(a3)+        Pattern %101110
        move.w  d3,(a3)+
        move.w  d2,(a3)+
        jmp     (a5)		46
*	
        move.w  d2,(a3)+        Pattern %101111
        move.w  d3,(a3)+
        move.w  d3,(a3)+
        jmp     (a5)		47
*
        move.w  d3,(a3)+	Pattern %110000
        move.w  d0,(a3)+
        move.w  d0,(a3)+
        jmp     (a5)		48
*	
        move.w  d3,(a3)+        Pattern %110001
        move.w  d0,(a3)+
        move.w  d1,(a3)+
        jmp     (a5)		49
*	
        move.w  d3,(a3)+        Pattern %110010
        move.w  d0,(a3)+
        move.w  d2,(a3)+
        jmp     (a5)		50
*	
        move.w  d3,(a3)+        Pattern %110011
        move.w  d0,(a3)+
        move.w  d3,(a3)+
        jmp     (a5)		51
*
        move.w  d3,(a3)+	Pattern %110100
        move.w  d1,(a3)+
        move.w  d0,(a3)+
        jmp     (a5)		52
*	
        move.w  d3,(a3)+        Pattern %110101
        move.w  d1,(a3)+
        move.w  d1,(a3)+
        jmp     (a5)		53
*	
        move.w  d3,(a3)+        Pattern %110110
        move.w  d1,(a3)+
        move.w  d2,(a3)+
        jmp     (a5)		54
*	
        move.w  d3,(a3)+        Pattern %110111
        move.w  d1,(a3)+
        move.w  d3,(a3)+
        jmp     (a5)		55
*
        move.w  d3,(a3)+	Pattern %111000
        move.w  d2,(a3)+
        move.w  d0,(a3)+
        jmp     (a5)		56
*	
        move.w  d3,(a3)+        Pattern %111001
        move.w  d2,(a3)+
        move.w  d1,(a3)+
        jmp     (a5)		57
*	
        move.w  d3,(a3)+        Pattern %111010
        move.w  d2,(a3)+
        move.w  d2,(a3)+
        jmp     (a5)		58
*	
        move.w  d3,(a3)+        Pattern %111011
        move.w  d2,(a3)+
        move.w  d3,(a3)+
        jmp     (a5)		59
*
        move.w  d3,(a3)+	Pattern %111100
        move.w  d3,(a3)+
        move.w  d0,(a3)+
        jmp     (a5)		60
*	
        move.w  d3,(a3)+        Pattern %111101
        move.w  d3,(a3)+
        move.w  d1,(a3)+
        jmp     (a5)		61
*	
        move.w  d3,(a3)+        Pattern %111110
        move.w  d3,(a3)+
        move.w  d2,(a3)+
        jmp     (a5)		62
*	
        move.w  d3,(a3)+        Pattern %111111
        move.w  d3,(a3)+
        move.w  d3,(a3)+
        jmp     (a5)		63, phew!
*
*****************************************************
*
* CSIZE 2,0 OVER -1 implementation
*
* This skips strip-colour bytes entirely and exclusive-ORs INK
* into the others, making it only slightly slower than OVER 0.
* Padding NOPs are required to maintain 8-byte block alignment
*
csize0e addq.l	#6,a3		Pattern %000000, 6 strip bytes
        jmp     (a5)
	nop
	nop	                0
*	
        addq.l  #5,a3           Pattern %000001
        eor.b   d1,(a3)+        5 strip, 1 ink pixel
        jmp     (a5)		
	nop			1
*	
* When only one byte needs to be updated we might use eor.b
* and adjust the addq.l before or after to skip the others.
* But on a 16 or 32-bit bus the entire word must be read
* and written even if this leaves one byte unchanged, so
* there's no performance saving in only addressing a byte.
*
* Since we ae limited to six bytes of code (plus the jmp)
* to update each group of six pixels, the preservation of
* post-increment word alignment and code economy of doing
* a byte update with a word skip saves more time than not
* touching strip-colour bytes in many cases, and only 8-
* bit (68008, Thor 20 or similar constrained bus systems)
* are slowed by the two extra byte transfers. That's why
* these routines use byte operations only when they fit.
*
        addq.l  #4,a3           Pattern %000010
        eor.w   d2,(a3)+        Faster except on 68008
        jmp     (a5)
        nop			2
*	
        addq.l  #4,a3        	Pattern %000011
        eor.w   d3,(a3)+        4 strip, 2 ink pixels
        jmp     (a5)
        nop			3
*
        addq.l  #3,a3      	Pattern %000100
        eor.b   d1,(a3)+
        addq.l  #2,a3      
        jmp     (a5)		4
*	
        addq.l  #3,a3           Pattern %000101
        eor.b   d1,(a3)+
        eor.w   d1,(a3)+
        jmp     (a5)		5
*	
        addq.l  #3,a3           Pattern %000110
        eor.b   d1,(a3)+
        eor.w   d2,(a3)+
        jmp     (a5)		6
*	
        addq.l  #3,a3           Pattern %000111
        eor.b   d1,(a3)+
        eor.w   d3,(a3)+
        jmp     (a5)		7
*	
        addq.l  #2,a3           Pattern %001000
        eor.b   d1,(a3)+
        addq.l  #3,a3      
        jmp     (a5)		8
*	
        addq.l  #2,a3           Pattern %001001
        eor.w   d2,(a3)+
        eor.w   d1,(a3)+
        jmp     (a5)		9
*	
        addq.l  #2,a3           Pattern %001010
        eor.w   d2,(a3)+
        eor.w   d2,(a3)+
        jmp     (a5)		10
*
        addq.l  #2,a3           Pattern %001011
        eor.w   d1,(a3)+
        eor.w   d3,(a3)+
        jmp     (a5)		11
*
        addq.l  #2,a3      	Pattern %001100
        eor.w   d3,(a3)+        Postincrement is free
        addq.l  #2,a3      
        jmp     (a5)            12
*	
        addq.l  #2,a3           Pattern %001101
        eor.w   d3,(a3)+
        eor.w   d1,(a3)+
        jmp     (a5)            13
*	
        addq.l  #2,a3           Pattern %001110
        eor.w   d3,(a3)+
        eor.w   d2,(a3)+
        jmp     (a5)            14
*	
        addq.l  #2,a3           Pattern %001111
        eor.w   d3,(a3)+
        eor.w   d3,(a3)+
        jmp     (a5)            15
*
        eor.w   d1,(a3)+	Pattern %010000
        addq.l  #4,a3            
        jmp     (a5)
        nop			16
*	
        eor.w   d1,(a3)+        Pattern %010001
        addq.l  #3,a3      
        eor.b   d1,(a3)+
        jmp     (a5)		17
*	
        eor.w   d1,(a3)+        Pattern %010010
        addq.l  #2,a3      
        eor.w   d2,(a3)+
        jmp     (a5)		18
*	
        eor.w   d1,(a3)+        Pattern %010011
        addq.l  #2,a3      
        eor.w   d3,(a3)+
        jmp     (a5)		19
*
        eor.w   d1,(a3)+	Pattern %010100
        eor.w   d1,(a3)+
        addq.l  #2,a3      
        jmp     (a5)		20
*	
        eor.w   d1,(a3)+        Pattern %010101
        eor.w   d1,(a3)+
        eor.w   d1,(a3)+
        jmp     (a5)		21
*	
        eor.w   d1,(a3)+        Pattern %010110
        eor.w   d1,(a3)+
        eor.w   d2,(a3)+
        jmp     (a5)		22
*	
        eor.w   d1,(a3)+        Pattern %010111
        eor.w   d1,(a3)+
        eor.w   d3,(a3)+
        jmp     (a5)		23
*
        eor.w   d1,(a3)+	Pattern %011000
        eor.b   d1,(a3)+
        addq.l  #3,a3      
        jmp     (a5)		24
*	
        eor.w   d1,(a3)+        Pattern %011001
        eor.w   d2,(a3)+
        eor.w   d1,(a3)+
        jmp     (a5)		25
*	
        eor.w   d1,(a3)+        Pattern %011010
        eor.w   d2,(a3)+
        eor.w   d2,(a3)+	
        jmp     (a5)		26
*	
        eor.w   d1,(a3)+        Pattern %011011
        eor.w   d2,(a3)+
        eor.w   d3,(a3)+
        jmp     (a5)		27
*
        eor.w   d1,(a3)+	Pattern %011100
        eor.w   d3,(a3)+
        addq.l  #2,a3      
        jmp     (a5)		28
*	
        eor.w   d1,(a3)+        Pattern %011101
        eor.w   d3,(a3)+
        eor.w   d1,(a3)+
        jmp     (a5)		29
*	
        eor.w   d1,(a3)+        Pattern %011110
        eor.w   d3,(a3)+
        eor.w   d2,(a3)+
        jmp     (a5)		30
*	
        eor.w   d1,(a3)+        Pattern %011111
        eor.w   d3,(a3)+
        eor.w   d3,(a3)+
        jmp     (a5)		31
*
* Second half, top bit of pattern set, leftmost pixel INK
*
        eor.b   d1,(a3)+	Pattern %100000
        addq.l  #5,a3      
        jmp     (a5)
	nop			32
*	
        eor.b   d1,(a3)+	Pattern %100001
        addq.l  #4,a3      
        eor.b   d1,(a3)+
        jmp     (a5)		33
*	
        eor.b   d1,(a3)+	Pattern %100010
        addq.l  #3,a3      
        eor.w   d2,(a3)+
        jmp     (a5)		34
*	
        eor.b   d1,(a3)+	Pattern %100011
        addq.l  #3,a3      
        eor.w   d3,(a3)+
        jmp     (a5)		35
*
        eor.w   d2,(a3)+	Pattern %100100
        eor.w   d1,(a3)+
        addq.l  #2,a3      
        jmp     (a5)		36
*	
        eor.w   d2,(a3)+	Pattern %100101
        eor.w   d1,(a3)+
        eor.w   d1,(a3)+
        jmp     (a5)		37
*	
        eor.w   d2,(a3)+        Pattern %100110
        eor.w   d1,(a3)+
        eor.w   d2,(a3)+
        jmp     (a5)		38
*		
        eor.w   d2,(a3)+        Pattern %100111
        eor.w   d1,(a3)+
        eor.w   d3,(a3)+
        jmp     (a5)		39
*	
        eor.w   d2,(a3)+        Pattern %101000
        eor.b   d1,(a3)+
        addq    #3,a3   
        jmp     (a5)		40
*	
        eor.w   d2,(a3)+        Pattern %101001
        eor.w   d2,(a3)+
        eor.w   d1,(a3)+
        jmp     (a5)		41
*	
        eor.w   d2,(a3)+        Pattern %101010
        eor.w   d2,(a3)+
        eor.w   d2,(a3)+
        jmp     (a5)		42
*	
        eor.w   d2,(a3)+        Pattern %101011
        eor.w   d1,(a3)+
        eor.w   d3,(a3)+
        jmp     (a5)		43		
*
        eor.w   d2,(a3)+	Pattern %101100
        eor.w   d3,(a3)+
        addq    #2,a3   	44
        jmp     (a5)
*	
        eor.w   d2,(a3)+        Pattern %101101
        eor.w   d3,(a3)+
        eor.w   d1,(a3)+
        jmp     (a5)		45
*	
        eor.w   d2,(a3)+        Pattern %101110
        eor.w   d3,(a3)+
        eor.w   d2,(a3)+
        jmp     (a5)		46
*	
        eor.w   d2,(a3)+        Pattern %101111
        eor.w   d3,(a3)+
        eor.w   d3,(a3)+
        jmp     (a5)		47
*
        eor.w   d3,(a3)+	Pattern %110000
        addq.l  #4,a3   
        jmp     (a5)
        nop			48
*	
        eor.w   d3,(a3)+        Pattern %110001
        addq.l  #3,a3   
        eor.b   d1,(a3)+
        jmp     (a5)		49
*	
        eor.w   d3,(a3)+        Pattern %110010
        addq.l  #2,a3   
        eor.w   d2,(a3)+
        jmp     (a5)		50
*	
        eor.w   d3,(a3)+        Pattern %110011
        addq.l  #2,a3   
        eor.w   d3,(a3)+
        jmp     (a5)		51
*
        eor.w   d3,(a3)+	Pattern %110100
        eor.w   d1,(a3)+
        addq.l  #2,a3   
        jmp     (a5)		52
*	
        eor.w   d3,(a3)+        Pattern %110101
        eor.w   d1,(a3)+
        eor.w   d1,(a3)+
        jmp     (a5)		53
*	
        eor.w   d3,(a3)+        Pattern %110110
        eor.w   d1,(a3)+
        eor.w   d2,(a3)+
        jmp     (a5)		54
*	
        eor.w   d3,(a3)+        Pattern %110111
        eor.w   d1,(a3)+
        eor.w   d3,(a3)+
        jmp     (a5)		55
*
        eor.w   d3,(a3)+	Pattern %111000
        eor.b   d1,(a3)+
        addq    #3,a3   
        jmp     (a5)		56
*	
        eor.w   d3,(a3)+        Pattern %111001
        eor.w   d2,(a3)+
        eor.w   d1,(a3)+
        jmp     (a5)		57
*	
        eor.w   d3,(a3)+        Pattern %111010
        eor.w   d2,(a3)+
        eor.w   d2,(a3)+
        jmp     (a5)		58
*	
        eor.w   d3,(a3)+        Pattern %111011
        eor.w   d2,(a3)+
        eor.w   d3,(a3)+
        jmp     (a5)		59
*
        eor.w   d3,(a3)+	Pattern %111100
        eor.w   d3,(a3)+
        addq.l  #2,a3   
        jmp     (a5)		60
*	
        eor.w   d3,(a3)+        Pattern %111101
        eor.w   d3,(a3)+
        eor.w   d1,(a3)+
        jmp     (a5)		61
*	
        eor.w   d3,(a3)+        Pattern %111110
        eor.w   d3,(a3)+
        eor.w   d2,(a3)+
        jmp     (a5)		62
*	
        eor.w   d3,(a3)+        Pattern %111111
        eor.w   d3,(a3)+
        eor.w   d3,(a3)+
        jmp     (a5)		63, phew!
*
*****************************************************
*
* Unrolled CSIZE 3,0 OVER 0 implementation - not used yet
*
* CHR$ 0
csize1e	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 1
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 2
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 3
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 4
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 5
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 6
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 7
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 8
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 9
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 10
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 11
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 12
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 13
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 14
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 15
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 16
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 17
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 18
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 19
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 20
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 21
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 22
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 23
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 24
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 25
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 26
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 27
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 28
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 29
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 30
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 31
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 32
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 33
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 34
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 35
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 36
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 37
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 38
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 39
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 40
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 41
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 42
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 43
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 44
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 45
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 46
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 47
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 48
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 49
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 50
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 51
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 52
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 53
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 54
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 55
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 56
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 57
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 58
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 59
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 60
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 61
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 62
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 63
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 64
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 65
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 66
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 67
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 68
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 69
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 70
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 71
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 72
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 73
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 74
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 75
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 76
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 77
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 78
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 79
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 80
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 81
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 82
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 83
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 84
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 85
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 86
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 87
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 88
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 89
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 90
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 91
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 92
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 93
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 94
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 95
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 96
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 97
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 98
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 99
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 100
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 101
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 102
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 103
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 104
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 105
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 106
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 107
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 108
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 109
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 110
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 111
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 112
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 113
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 114
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 115
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 116
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 117
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 118
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 119
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 120
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 121
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 122
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 123
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 124
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 125
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 126
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 127
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 128
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 129
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 130
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 131
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 132
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 133
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 134
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 135
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 136
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 137
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 138
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 139
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 140
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 141
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 142
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 143
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 144
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 145
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 146
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 147
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 148
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 149
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 150
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 151
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 152
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 153
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 154
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 155
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 156
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 157
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 158
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 159
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 160
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 161
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 162
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 163
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 164
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 165
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 166
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 167
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 168
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 169
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 170
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 171
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 172
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 173
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 174
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 175
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 176
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 177
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 178
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 179
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 180
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 181
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 182
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 183
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 184
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 185
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 186
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 187
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 188
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 189
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 190
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 191
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 192
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 193
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 194
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 195
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 196
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 197
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 198
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 199
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 200
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 201
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 202
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 203
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 204
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 205
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 206
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 207
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 208
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 209
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 210
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 211
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 212
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 213
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 214
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 215
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 216
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 217
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 218
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 219
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 220
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 221
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 222
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 223
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 224
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 225
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 226
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 227
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 228
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 229
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 230
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 231
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 232
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 233
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 234
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 235
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 236
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 237
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 238
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 239
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 240
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 241
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 242
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 243
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 244
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 245
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 246
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 247
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 248
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 249
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 250
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 251
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 252
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d0,(a3)+
	jmp (a5)
* 253
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d1,(a3)+
	jmp (a5)
* 254
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d2,(a3)+
	jmp (a5)
* 255
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	move.w d3,(a3)+
	jmp (a5)
* 256

	end