Data output using Return with data (PIC 18Fx)
Both the 16F and 18F series devices support Look Up Tables (LUTs) using the 'Return with data' instruction.
To access the LUT, some 'control code' is called to performs a 'computed jump' (using Copy Acc, PCL) into the LUT to Return with the required data.
The 'computed jump' limits the table size to 256 address locations = that's 256 entries for the 12bit instruction 16F series or 128 entries for the 18F series (which uses byte addressing, with each instruction = 1 word). The table size is further linited by the 'control code' (which must reside in the same 256 address 'block' as the LUT).
The problem with using the LUT to output data bytes at high speed - for example to a PORT wired to an external serial shift register - is the LUT 'Return' goes back to the 'main code', rather than the 'control' code. Of course the main code could Call again, but that just adds more CLK cycles.
Whilst it's possible to 'cheat' the Return with the 16F5x series (54/7/9) by 'overflowing' the 2 level Return stack, the 'clever' stack overflow/underflow 'protection' of the 18F series prevents this.
However, the PIC 18Fx do have a PUSH instruction (1 CLK), which places PC+2 onto the stack. Further, the low 8 bits of the ToS (Top of Stack) can be accessed as a Register ('TOSL') and adjusted. Adding or subtracting from TOSL has no effect on the rest of the ToS address (which is readable, but not writable) = so we can't adjust the ToS to 'point' at any address outside the 256 byte (128 instruction) 'bank' of the PUSH instruction.
Using a combination of PUSH and ToS adjust, the 'control code' can set itself up as a 'loop' and thus allow the main-line code to output a sequence of data bytes with a single 'Call'.
To avoid the need for 'counters', one of the codes 'looked up' can POP the control Return address and thus cause the 'end of output'. Typically, 'code 0' is used, which means arranging the start of the Look-Up-Table on a 0xnn00 address (i.e at the start of a 128 word 'address bank').
In the example below, one 'pixel scan line LUT' is shown for a 96 character 5x7 FONT (in practice, 7 LUT's will be needed, one for each of the 7 scan lines).
Note the POP at LUT location 0. When the control code jumps here, instead of Return with pixel data for 'character 0', the control code Return address is POP'd. Execution then 'falls through' to location 1 which Return's to the main-line code (with a spurious value in the Acc). It is left to the manin-line code to clear the last output value from the PORT
Location 0xnn00: ;start of LUT for scan line N, 96 words, one per shape byte
POP ; ascii code 0 = end of lookup sequence, so this location is the exit point (and drops through)
RET shape for ascii code 1 (0x01, address 0x02) scanLine N
RET shape for ascii code 2 (0x02, address 0x04) scanLine N
...
RET shape for ascii code1 95 (0x5F, address 0xC0) scanLine N
; end of table
; The next adress is Location 0xnnC2:
NOP ; filler to adjust address
;
Location 0xnnC4: ; start of control code NOTE address alignment is critical
LineNout: ;call here with Acc=0 (since it's going to be output before the LUT is used for the first time)
PUSH ;setup the return to PC+2, i.e. 0xnnC6 (1)
; On Return with value, we need CPU to 'land' back on the PUSH = oxnnC4, and NOT here oxnnC6 = so adjust ToS
BClr b1,TOSL ; convert C6 into C4 (this is 1 instruction faster than DEC, DEC) (1)
COPY Acc,PORTx ; output the value (1)
ROTL INDF,Acc ; get 2x ascii code to ACC - note, top bit must be 0 (1)
INC FSR ; point at next ascii code (1) [remember, code 0 = end of output]
COPY Acc,PCL ; get data by jump into LUT (2 to jump, 2 to Return with value
; OK, thats it 7 instructions, Acc is written to PORT on 4th instruction after LUT Return
9 CLK's per output byte. This is the same as using direct Table addressing for variable length output (however, dicect Table addressing can achieve 6 CLK's for a fixed length string).
NOTE: the final shape code is left on the PORT - so the calling routine needs to perform a CLR PORTx (to maintain the timing, there should be a 2 CLK delay before CLR PORT (i.e. this should be the 3rd instruction after RETurn)
Arbitary character count, faster
The problem with a LUT using 'Return with data' is that whilst the single Return instruction is faster than 'Load Acc, value' followed by a 'Jump address', the ToS has to be setup (prior to the jump into the table) so the Return will arrive at the 'top of control loop' again.
However there is one case where we don't need to 'setup' the Return to control code address (by using PUSH etc), and that's when the 'return' is to control code starting on a '00 boundary' address. In that specific case, we can 'Return' by 'Resetting' the PCL i.e. by loading PCL with 00 using the CLR PCL instruction.
Of course there is a 'trade off' - each table entry is now 2 instuctions - a 'Load value to Acc' followed by a 'return' (using 'CLR PCL'). This means a 'byte address' code has to be 'multiplied' by 4 to get the LUT address. To minimise the control code, instead of adding an extra ROTL (x2), the ascii codes will be stored already 'doubled' (so only a single ROTL is required convert the code into a x4 LUT address).
Since the control code now occupies the LUT locations for ascii 0x00 (and 0x01) we can't use 0x00 as the 'end of line' anymore. Instead some other value (in the example below, ascii value 0x02, held doubled = 4, at LUT address 08) must be used as an 'end of output' code.
Location 0xnn00: ;start of control code for LUT of scan line N (LUT jumps back here)
COPY Acc,PORTx ; output the value (1)
ROTL FSR,Acc ; get ascii code to ACC - note, top bit must be 0 (1)
INC FSR ; point at next ascii code (1) [code 0 = end of output]
COPY Acc,PCL ; jump into table (2 to jump, 3 to return with value to PUSH above
;start of LUT for scan line N, 2 words per entry
Location 0xnn08:
NOP ;ascii code 04 = end of line, Return to main-line code
RETURN 0
LOAD Acc,value ;(ascii 03, doubled 6, address 0x0C) scanLine N
CLR PCL ;zero PCL (return to start of loop)
...
LOAD Acc,value ;(ascii 63, doubled 126, address 0xFC) scanLine N
CLR PCL ;zero PCL (return to start of loop)
8 CLK's per output byte. However the LUT is limited to 61 values, 1 of which (0x02) must be used as an 'end of output' flag (to Return to the main-line code)
Up to 30 characters
If a larger LUT is required, but the number of charcters to be output is limited, it is possible to speed up the code by 'pre-loading' the Stack with the Return addresses - for example, if 10 values are to be output, 10 PUSH instructions can be performed (and adjust the ToS for each) before starting the output sequence.
By placing the 'Output Loop' code at address 0xnn00 address, the Retrun 'raget' is 0x00, so adjusting ToS just means 'resetting' it (rather than overwriting it with some arbitary value).
The '30 limit' is imposed by the number of available Stack locations (it's 'up to' because the main-line code can 'call' anywhere into the 'PUSH stack set-up' code)
There are two small issues - first the 'PUSH stack' must be in the same 256 byte block as the LUT (so it 'costs' 2 LUT locations per value to be looked up) and second the main-line code has to 'Call' the output '2n CLK's in advance'.
The example below is for max. of 10 values.
; start of the '256 byte (128 instruction) address bank'
; Output Loop
Location 0xnn00: ;LUT Retuns here with value
COPY Acc,PORT ;output value (1)
ROTL INDF,Acc ;get next code (x2 for byte to word addressing) (1)
INC FSR ; (1)
COPY Acc,PCL ; do the next lookup (2) for jump into LUT, (2) for return to 0xnn00
; end of control code, 4 LUT locations 'used-up' (not available for look-up)
; Start of LUT, 1 word per data byte
RETURN ; assume byte 4 is the 'return to main loop' code
RETURN byte5
RETURN byte6
RETURN byte7
....
; top of table is limited by the PUSH code
; (in this case, top is 256-40 = 216, word 108
; start of control code (can be anywhere in the 256 byte bank)
; main-line code calls here 2n CLK's before output is required
;
; PUSH stack setup = set up the maximium n (eg 10) output values
;
; entry point for 10 values
PUSH ;create stack entry 10
CLR TOSL ;adjust entry to Output Loop
; entry point for 9 values
PUSH ;create stack entry 9
CLR TOSL ;adjust entry to Output Loop
PUSH ;create stack entry 8
CLR TOSL ;adjust entry to Output Loop
PUSH ;create stack entry 7
CLR TOSL ;adjust entry to Output Loop
PUSH ;create stack entry 6
CLR TOSL ;adjust entry to Output Loop
PUSH ;create stack entry 5
CLR TOSL ;adjust entry to Output Loop
PUSH ;create stack entry 4
CLR TOSL ;adjust entry to Output Loop
PUSH ;create stack entry 3
CLR TOSL ;adjust entry to Output Loop
PUSH ;create stack entry 2
CLR TOSL ;adjust entry to Output Loop
; entry point for 1 value
PUSH ;create stack entry 1
CLR TOSL ;adjust entry to Output Loop
; on 11th look-up, LUT Return's to the main code
CLR PCL ;start the Output Loop (goto 0xnn00)
7 CLK's per byte, however there is a 'start delay' as the PUSH Stack is set up, of 2n CLK's (where n= max number of outputs supported)
A 96 character FONT needs 96 word addresses for the LUT. The Output Loop is 4 instructions (4 words) leaving 128-96-4 = 28 words for the 'PUSH stack'.
The above approach would limit the output to a maximium of 14 values. To support more, the in-line PUSH sequence could be written as a 'loop' (at the cost of increased 'pre output delay'). The limit would then be due to the 'maximium stack depth' (30).
; Main-line code CALL's here
; Setup the PUSH stack
COPY FSR,Acc ;save value pointer
COPY Acc,Temp
SKIP ; skip the next instruction (INC FSR)
next:
INC FSR ;point at next value
PUSH ;set up a LUT return ..
CLR TOSL ; .. to addrtess 0xnn00
SkipZ INDF ;skip if this is the last value
BRA next: ;not last, setup another PUSH
; PUSH stack done, restor the INDF pointer
COPY Temp,Acc
COPY Acc,FSR ;restore the value pointer
CLR PCL ;start the Output Loop (goto 0xnn00)
To build the PUSH stack using a loop requires 4 + 6n + 3 CLK's per value to be output
In-line LUT output
The final way of speeding up output is to eliminate the 'return to control loop'. This means each entry in the LUT has to perform a 'direct jump' to the next. In effect, each LUT location has to contain a copy of the 'control' code.
This approach is not limited to the 18F series (with it's PUSH instruction and addressable TOSL = Top Of Stack, Lo byte), since it turns out it's faster to 'precalculate' the Return addresses and store them in the Register set pointed to by INDF (i.e. the 'ascii value' Registers now contain the 'table address' required to get from one LUT location to the next, rather than just the ascii value)
Each LUT entry is 5 words (10 bytes). We can include a 'x2' word to byte multiply within the LUT code, so we only have to pre-calculate by x5 (ascii code 0 is still used as the 'end of line'). The LUT supports 25 ascii codes (plus 0).
x5 multiply is COPY Reg,Acc ROTL Reg ROTL Reg Add Acc,Reg. Since code 0 = end of line, this can be done in a INDF/INC FSR loop with a 'Jump back if non-zero' after the ADD at the end
Location 0xnn00: ; start (and end) of Table
RETURN ; ascii code 0 (when the control code (COPY Acc,PCL) arrives here, Return to main code)
; first table entry, for ascii code '1' = (byte) address 10 (0x0A)
Location 0xnn0A
LOAD literal,Acc ; load this shape (1)
COPY Acc,PORTx ;o/p it (1)
; (main code can 'CALL' here - or any similar position)
ROTL INDF,Acc ;get (next) word offset x2 for byte address (1)
INC FSR ;point at next (1)
COPY Acc,PCL ;goto next shape location (2)
6 CLK's per byte. However each LUT entry is now 5 instructions (10 bytes), so LUT max. is 25 values ... (just enough for a VTI, perhaps ?)
Note that it is not possible to use the (single word) COPY Reg1,Reg2 instruction with the PCL as Reg2 (the destination) i.e. COPY INDF,PCL is not a valid instruction (the stack can't be used as destination either)
Up to 31 characters, in-line LUT
At the end of the 'Up to 30 characters' section, there is a 'build the PUSH stack uisng a loop' extension. In this extension, the PUSH Return address was set to 0. Instead of setting it to 0 (the Control Loop) why not set the Return address to the LUT location of the next value to be looked up ?
Of course this means that we have to load the stack 'backwards' - so the main-line code has to point at the end of the 'string' to be outout (and the corntal code has to check for the 'start'). Also, each LUT location has to ouput the byte 'looked up' by the previous, so each LUT entry is now 2 words :-
; LUT contents
COPY Acc,PORT ;output the value Rerurned by the previous lookup
RETURN value ;use Return to lookup the next value
3 CLK's per byte output. However the main-line code has to CALL well in advance :-)
Each LUT location outputs the value looked up by the previous. This means the 'control code' has to start the output with a 'jump' directly into the table at a 'Return value' location. Further, the stack PUSH has to set up the string 'backwards, since the first value PUSH'd will be the last to be looked up.
FSR starts pointing at the last code to be output, is DECremented and left pointing at the '0' flag (start-1) position.
The main-line code CALL's the PUSH stack set-up code (note we PUSH n-1, the first code is looked up with a direct jump into the LUT) :-
Loop:
PUSH ;generate a Return
COPY Acc,TOSL ; overwrite the Return
ADD Acc,TOSL ; with 2x the lookup value
;
Callhere: ; mainline CALL's here, FSR 'points' at end of string (and value 0 = start flag)
ROTL INDF,Acc ;get current value byte>word (may be start of string)
DEC FSR ; aim at 'previous'
TESTskipZ INDF ;test for zero flag = start found
BRA Loop: ; not zero, keep PUSHing
; here we have reached the start of string - the Stack is setup, just need to jump to LUT 'Return value' for first code
CLR Temp ; going to use Temp reg ro calc address
BSET Temp,b1 ; start at word address 1
ADD Acc,Temp ; add value (word)
ADD Temp,Acc ; add Temp to Acc to get LUT (2 words per entry) +1
COPY Acc,PCL ; jump into LUT at 'Return value' address for first output
The control code is 12 instructions, each LUT entry is 2 words, so max. LUT is (128-12)/2 = 114/2 = 57 values, and the max. output is 31 values (the first code is looked up with a direct jump into the LUT, so max. 30 stack locations (the 31st is the Return to the main-line code))
Thats it for LUT's. See also Table Addressing