Optimised 5x8 character set

Each of the 94 ascii character shapes (0x21 '!' to 0x7E '~') - or 95, if we count 'space' (0x20) as a character (rather than treat it as a 'special case')) must be defined within the 'dot-matrix' bit-map of 5 columns wide by 8 'dots' high. Of course many characters are less than the 'full' width (eg . , : ; ' " etc) and whilst simple display systems will 'pad out' such characters to fill the 'full width', variable width display always 'looks' better.

To minimise storage space, we only want to store the actual shape (and not any 'blank column' padding), so, for example, a full stop '.', which is only 1 column wide, will only need 1 byte of storage space. Even so, to fit the entire character set within one PIC 256 'page' would require an average character width of only 2.7 bytes. In fact, my shapes require 365 locations (an average width of 3.842) so require 2 PIC DataTable 'pages'

Given that only the actual 'used' width is stored, some means of locating the ends of characters will be needed that uses the minimum of extra bytes (for example, adding an 'end flag' to each character will 'cost' 94 extra bytes (one per char)). My approach is to use a 'width table' (where the width counts are packed into as few bytes as possible).

Width table - first pass

Storing the character widths uses table space and there is no point in eliminating blank columns if the widths table 'costs' us more space than variable shapes would 'save' !

More vital is the need to fit the tables into no more than 2 PIC address regions. The widths table plus the shapes table thus have to fit within 255 locations per table

Encoding character width counts of 1 to 5 bytes requires 3 bits. Simple packing means we can only get 2 widths per byte, leading to a width table of 94/2 = 47 bytes (or 24 per half table). My first 'split' into two tables resulted in my first table having only 23 'spare' locations, so I had to look again.

To save space, I reduced the width 'codes' to 2 bits, allowing then to be packed 4 per byte, reducing the width table from 24 per shape table to 12 - however this means 'padding out' the least common width to the next size up, which also 'costs' extra locations.

The actual choice of 'which width to pad' depends on the widths of your actual character shapes. In my case, I have 9x width1, 9x width2, 9x width 3, 28x width4 and 40x width5. To minimise padding, I could pick any of width 1,2 or 3 (since I have 9 of each) (and pad to 2,3 or 4).

Code 00 is used for width5 (the most common width, so a packed byte of 0x00 (east to detect) = 4 widths of 5 and need not be checked further). This leaves codes 01, 10 and 11 to be used the remaining widths. Since '4' has to be one of the encodings, the obvious choice to pad is 'width1', since we can then 'decode the code' by incrementing the 2 bits i.e. 01 -> 2, 10 -> 3, 11 -> 4 OR (as below) we can code for 'reduction', so width3 = code 10 = reduce count 5 to 3 by subtracting 2)).

I decided to pad the width1's to 2 giving a total padding 'cost' of is 9 bytes, of whihc 6 will be in one table and 3 the other.

To discover the location of a character means 'adding up' the widths of all characters in the table prior to that one, so those can be 'skipped over' to reach the one we want.

The 'position calculation' extracts each 2 bit code from the width lookup table and adds up an 'offset'. The calculation is optimised by checking for 0x00 = 4 codes of widths 5 (5 being the most common width).

Width table - final

On re-splitting the 'shapes map' tables I ended up with sufficient 'spare' space to revert to the initial '2 width codes per byte' packing. With 4 bits per 'code', the widths can be held directly (1-5), plus an individual width can be 'extracted' from the table using a 'mask' plus the 'nibble swap' instruction, making the offset calculations simpler. Finally, with no padding required, the offset calculations are further simplified.

The 'cost' of doing this is an extra 24 locations (12 per table), at a 'saving' of some main-line instructions during the width offset calculations.

The character shapes

Many character sets can be found 'on-line', however (as usual) finding exactly what you want is virtually impossible, especially if you want variable widths. In the end I just 'bit the bullet' and 'designed' my own character set

A number of 'character shape' utilities can also be found on-line, however they all suffer from the 'see and modify one character at a time' drawback = it is only when you see the new character at the same time as the rest of the 'set' that you can build a character 'set' with the same 'style' throughout.

I used MS Excel to layout the characters, calculate the required column code values and to generate a set of 'Return with value' strings that can be 'cut and pasted' directly into a PIC program

Another advantage of using Excel is that a formula can 'add up' the widths and provide answers to the questions 'do we save more bytes by using a variable width table (rather than padding all shapes to 5 bytes) ?' and 'how many bytes are saved for each possible dropped width (in a 2 bit width encoded table) ?'

PIC Data Tables

My final shape and width tables can be downloaded here (zip)

First, we must always remember that the PIC16F5x only has a 2 level subroutine return stack. Since we will be looking up shapes 'only once', the actual look-up code should be embedded in the 'main line' code (and not made into a subroutine).

Next, to avoid the complexity of 'PA bits', the 'widths' Table will also be split into two (and placed immediately before the corresponding shapes table).

Finally, ascii 'space' (0x20) is the 'most common' character to be looked up. Careful coding will 'truncate' the lookup process when 'space' is detected without the need for lost of extra 'special case' code

A PIC DataTable is limited to 256 locations (actually, 255 as the first location is always a 'jump to address ($+Accumulator)' instruction). The character shapes (and the width table) require 2 DataTables - the first for characters 0x20-0x4F requires 202 locations, the second (chars 0x50-0x7E), 197 locations.

The 'shapes' (column bit patterns) will be fetched into a set of 6 registers (which the 'main line' code will 'shift' into the display registers 'as required'). Note that the shape register set is 6 because the inter-character 'gap' will be added here. The FSR will be returned 'pointing' at last reg+1 (i.e. the next 'free' reg). A 'count' register (countReg) will indicate how many bytes have been generated by the look-up (= this character width plus 1) = the last byte returned will always be '0x00' (i.e. the inter-character gap).


; NOTE we only 'load shape' in one place, so all the following is 'main line' code (despite the use of subroutine terminology)
; Arrive here with ascii code to be looked up in Accumulator, FSR pointing at the first free shape reg,
;  - on exit FSR points at the next free, widthReg with charWidth+1 (i.e. shape + inter-char gap)
; Enter with offsetReg = ascii code of char to be looked up
;
;
loadShape:
Clear the 6 byte shape register set (load FSR with shape base, load count with 6, clr @ INDF, dec count/branch back if non-zero)
Set the 'widthReg' to 2 (the count for 'space')
Set offsetReg = offsetReg (ascii code)-0x20.
If the offset is 0, then it's ascii 'space' & we can exit now
Set Acc = offsetReg - 0x30
If Cy, then ascii was < 0x50 and we skip the next inst
Copy Acc to offsetReg (the store instr does not effect Cy)
Load 'char Table 1' base addr to Acc (this does not effect Cy)
If No Cy, then ascii was >= 0x50 and we skip the next
load 'char Table 2' base to Acc
Move Acc to CharTable pointer
;
; OK, here 'offsetReg' contains the number of width counts we need to read, may be zero (if Table 2)
;  we need the number of bytes to read (number div 4) BUT will need to know the bottom 2 bits of the bi-bit count when we reach the last byte
Load Acc 0x03 	(get the bottom 2 bits of the ascii code = how many extra bi-bits must be taken into account)
AND offset to Acc
; now we want a mask, 00 means mask all (0x00), 01 means mask top 3 bi-bits (03), 10 mask top nibble (0F), 11 mask off top 2 bits (3F)
copy Acc,lobitreg	; save the bits
ADD lobitreg,Acc	; double the bit, so 0= no mask, 2= mask top 3 bi-bits, 4= mask top nibble (0F), 6 = mask off top 2 bits (3F)
IF zero, jump to maskDone:
ADD Acc,PCL			;jump 2,4 or 6
NOP
LOAD Acc,0x03			;jump2, mask is 0x03
JMP maskDone:
LOAD Acc,0x0F			;jump4, mask is 0x0F
JMP maskDone:
LOAD Acc,0x3F			;jump6, mask is 0x3F
maskDone:
copy Acc,lobitreg	; save the mask
; OK, NOW we can generate the count
ROTR offset to Acc
ROTR Acc to count
; There are two ways to discover the true offset - multiply by 5 and deduct the intervening bi-bits or add intervening (bi-bits+1)
; Bibit encoding makes it harder to 'add'
;  00 means 'add 5' or 'subtract 0' .. plainly it's easier to sub the code than to replace the code with 5 and add
;  Since all character widths are 2,3,4, we sub 1,2 or 3 (and can use codes 01=width4 (sub 1), 10=width3 (sub 2), 11=width2 (sub 3)
; so we are going to subtract bi-bits, so start with offset = 5x
Copy offset, Acc
ROTL offset		;x2
ROTL offset		;x4
ADD Acc,offset	;x5
Branch if zero to 'doshape'	;if the offset is zero, we can't deduct anything so can skip to the shape fetch
Load width Table pointer = width table base address
; Now reduce by short widths
byteLoop:
INC width Table pointer to Acc ;width Table pointer 0 means start of table ... i.e. location 1
COPY Acc width Table pointer   ;point at next byte
CALL widthTable
COPY Acc,bibits
; OK are we on the last byte ?
Move count to count (test count, dec count does not set Cy, so we have to do it 'long hand')
Branch if non-zero to do restofBytes
; here we have reached the last byte - mask off the bits we don't want
COPY lobiteg,Acc ;get the remainder mask
AND Acc,bibits	;mask off the bits we don't want
CALL bibitLoop
Jump doshape:
restofBytes:
CALL bibitLoop
DEC count
JUMP byteLoop
; OK, we have the charTable Base and the offset, get the shape into the INDF set
doshape:
; that's it, exit
JUMP shapeDone
; process bibits (i.e. subtract 0,2,3,4 from offset)
bibitLoop:
Copy bibit to Acc
If zero, RETURN
AND Acc,0x03	; mask off other bits - so have 00,01,10,11
Branch decdone if zero	; skip bi-bit code is 00
ADD Acc,PCL		; jump = 01=sub 4, 10 =sub 3, 11 =sub 2
DEC offset		; 01 jump dest
DEC offset		; 10 jump dest
DEC offset		; 11 jump dest
DEC offset
decdone:
; OK, done the dec, adjust the bi-bit value and do the next
ROTR bibit
ROTR bibit
JUMP bibitLoop
; The widthTable - arrive with Acc=byte wanted, MUST be >0 !
widthTable:
ADD Acc,PCL			;WARNING if Acc is zero, CPU locks up with 'JMP $'
RETURN bibits0
RETURN bibits1
RETURN bibits2
RETURN bibits3
RETURN bibits4
...
RETURN bibits 20

; All characters will be processed here, including 'space' (ascii code 0x20 = first in the list) and ending with 'tild' (ascii code 0x7E)
; Two subroutines are involved :-
;  calcOffset: which calculates the offset to the start of the shape Table
;  getCol: which returns one 'column' (last returned will be 00, up to calling routine to spot this)
calcOffset:
; Arrive with ascii code in Acc., returns offset in register 'offset' (0 = 'space' code)
;  Offset = sum of all character widths prior to 'this' character, so uses the Widths table
;  The ascii shapes table is in 2 parts, the Widths table is split into the same 2 parts ..
;  Start by subtracting 0x20 from the ascii code - if zero, then it's space and we can exit
;   else if result is 0x30-0x5E (i.e. ascii codes 0x50-0x7E) then it's width table 2
 
;  The Data Tables need not be in the same 'page' as the calling code
;  The PA bits in the Status Reg define the 'page' for Jump/Call (a Call destination must be within the first 256 locations of that page).
;  The 16F59 has 8 register banks. The top 3 bits of the FSR define the 'bank', the lower 5 bits one of 32 register addresses
;   So the 8 bit Reg address is organised as 8 banks, the first 16 addresses in any bank map to the 'special' regs in bank 0
;   (which includes 6 GP regs, x0A - x0F inc), the other 16 in each bank are actual registers
;   (= this is significant because it means you can't use INDF + FSR inc. to step through more than 16 regs in one go)
;  ASCII codes 0x20-0x4f are in Table 1, 0x50-0x7F in Table 2. 0x20 = space = 'special case' (Table 1, address 0)
; primeShape: calculates the 8bit offset from the Table start (the main code calls the correct table)
; The offset is the sum of all the character widths from table start to requested ascii character
;  If ascii is > 0x50, then we start from width of 0x50
;
 
CLR	offset		;say offset is 0
COPY Acc,count	; save the ascii code
AND 0x03		; mask the top 6 bits off
COPY Acc,btemp	; save the bottom 2 bits
; ascii code 0x20 maps to LSB of offset table 00
; so start by subtracting 0x20 from the ascii code
LOAD Acc,0x20
SUB Acc,count	;subtract 0x20 from the count (result to count)
SKIP NZ			;continue unless ascii code 0x20 (space)
RETURN			;exit (with offset 0) if 0x20
; here for non-space chars
ROTR count		;divide by 4 for the table address
ROTR count		;count is now how many to fetch from the width table-1
; Table splits at ascii 0x50-0x7E .. 0x30-0x5E (after 20 sub) .. 0x0C-0x14 (after shift)
; Add 0x04, then if the top nibble is non-zero we started with 0x50 or above
LOAD Acc,0x04
ADD count,Acc,Acc
AND 0xF0		;mask off bottom, Z flag if no top bits
SKIP Z			;skip if Z i.e. ascii code < 0x50
LOAD Acc,0x14	;set Acc to table start offset
; Acc is zero or table start offset 0x14
COPY Acc,tstart	;save it
LOAD Acc,wtaddr	;get the width table address start
ADD Acc,tstart	;add to tstart
; map address offset is sum of all the 2bit values (counting 00 as 5 and a full byte of 00's as 20. (0x14)
 
; OK, now go get count bytes
; fetch the value from width table address tstart, then check the count
COPY tstart,Acc
COPY Acc,PCL	;jump to table
COPY Acc,atemp	; save the 2bit counts
;
COPY count,Acc	;get the count back
BRA NZ,cont1	;continue if non-zero
; OK, on the last few 2 bits - get the 2bit count
COPY btemp,Acc	;get the number remaining
SKIP NZ			;skip unless all done
RETURN
; here for more 2bits - get the
 
 
cont1:
; continue with full bytes
 
 
wtaddr:
RETURN 0x00	;set of 4 2bit codes for ascii 0x20-0x23
RETURN 0x00	;set of 4 2bit codes for ascii 0x24-0x27
.... etc.
 
 
Next page :- PIC PS2 Keyboard to serial
[top]