PIC VGA driving - building a PIC18Fxxx based classical 'dumb terminal' VGA display
With the advent of the $5 Pi Zero, this Project becomes somewhat redundant - the Pi Zero can do a much better job for about the same (if not lower) price
Inspiration was taken from this pair of ATMEL chips - ATmega328 (about £2) + ATmega32 (about £6) design. The challenge is, can we build a simple terminal display with a PIC for less ?
It is often the case that you will need to control a simple device remotely using a serial RS232 Terminal (for example, a 'headless' Raspberry Pi) - and whilst a laptop PC will 'do the trick' quite well, it does seem like 'overkill'. Further, the VGA spec is used to drive small LCD panels, so you don't even need a 'full sized' screen. I thus set myself the challenge of building a VGA display (and serial keyboard) into a cheap PIC device
My first choice as always (of course) was to use whatever half-suitable PIC what I had in my 'bits box' - and I soon found a PIC 18F14K50 that looked like it might 'do the job'. Why the PIC 18F14K50 ? Well it's the 'cheapest' 48MHz part available = the 10 off price is about $2 ea. and I paid £1.81 ea. in 2015 from CPC. It has 8k x 16bit program locations (which should be plenty) but only 768 bytes of RAM (which could be a problem), although it does also have 256 bytes of EEPROM (which could be of some use, although 256 bytes is way too few to hold the character shape font - see Note1) One never-ending annoyance to me is how the PIC 'specifications' suffer from 'marketing hype'. The 'headline' 48MHz device actually runs at 12 MIPS (the CPU CLK is OSC/4) and whilst it has '16kb' of program memory that is actually only 8k of instruction space. Finally, in the 'old days' (of 14 bit instructions) the PCL allowed access to a full 'page' of 256 locations for Look-Up-Tables. In the 18F14K50 the PCL becomes a 'byte address' but then PCL bit0 is fixed at 0 = so a LUT 'page' is now only 127 locations before the 'upper bit' PC latches have to be reset. In other words, simple LUT's have to be limited to half the size compared to the 'old design / not recommended' chips !
The 48MHz part runs it's CPU at OSC/4 = 12MIPS, which is significantly slower than the 20MIPS AT-MEGA-328 (even if we manage to overclock it to 60MHz = 15MIPS, Note2). Into this I wanted to squeeze a minimum 640x480 VGA video generator, a directly connected (IBM PS/2 serial) keyboard and a 9600 baud serial link (to the 'host' computer).
**Note1. Of course it's tempting to place the character shape font into the EEPROM store and thus save program space. However, even using a minimalist 5x7 Font and packing 3 scan lines (so 3x5 = 15bits) into 2 bytes (16 bits), I would only have room for some 256/2*3/7 = 54 ascii character shapes. This would support an 'uppercase only' alphabet and might help give the display a nice 'retro' look, however 'unpacking' the shapes would 'cost' too many CLK's to make it worth while
Note2. The PIC 18F14K50 (48MHz) device comes off the same production line that makes the 64MHz devices, so overclocking to 50MHz should be no problem at all and even 60MHz might be possible (if I avoided using any of the 'analogue' features, including the x4 PLL).
The VGA display
VGA is based on the American NTSC TV display timings, but with no 'interleaving' (the scan line timing is twice as fast as normal TV (or 'AV')) and a visible screen area of 640 pixels x 480 lines.
One advantage of a computer LCD display, rather than a TV screen, is that the actual data rates (i.e. the 'pixel rate') timing is not 'locked' to TV screen requirements. The main disadvantage is that the minimium frame rate support by most screens is NTSC 60 Hz (50Hz was never supported and you will be hard pressed to find a screen that supports much below 60Hz However, so long as you stick to the HSync and VSync timing, you can have as many (or as few) pixels 'per line' as you like
Frame timing The frame rate is 59.94 Hz (non-interlaced), 16.683 ms (equivalent to 525 scan lines) Front porch (A) 0.318 ms (10 lines) Horizontal Sync (negative polarity) pulse (B) 0.064 ms (2 lines) Back porch (C) 1.048 ms (33 lines) Active (visible) video (D) 15.253 ms = 480 lines (or 48 lines of text using an 8x8+2 font) Line timing (at specified Clk) VGA Line timing at 31.46875uS (31.78kHz) = 800 clocks at 25.175 MHz. Front porch 16 clk VSync 96 clk Back porch 48 clk Visible pixels, 640 clk = 640 pixels (64 characters if using an 8x8+2 font)
Note. Whilst it's possible to drop the pixel clock frequency (so fewer pixels per line), there is nothing much we can do about the line frequency. The problem is that VGA is a well defined standard with almost no 'leeway' = even a 10 year old LCD monitor (such as a 2003 vintage Dell E171FP 17" LCD which supports 1280 x 1024 pixels at 75 Hz) has a minimum Horizontal scan frequency of 30 kHz (although it can run at up to 80 kHz) and a minimum frame rate of 56Hz (and up to 75Hz).
The VGA visible area is 640 pixels wide x 480 scan lines high. Text is typically displayed in an 8x10 character 'font' (which includes the inter-character gaps and inter-line space). This gets us 48 lines of 80 characters. Since the 8x10 font matrix includes the inter-character and inter-line spacing it allows the display of font based 'bit mapped' graphics..
The 7bit ascii character set is actually only some 96 shapes. There is no point wasting PIC instruction space with Look-up-tables (LUTs) containing the 'blank scan lines' between lines of text, however I did want to 'stick' to the basic 8x8 pixel map (which includes the inter-character gap, which may waste some space but makes output much simpler). This means storing a full 8x8 shape for each character. Using the 'Table Read' (TBLRD) approach, it is possible to look-up and output 1 shape byte every 9 CPU CLK's (and 'exit' when EOL is found). The main-line code will set up the high bytes of the Table Pointer prior to calling character oitput routine. Since character output 'works' by copying the ascioi code to the low byte of the Table Pointer, we will have 8 seperate tables (each starting on a 7bit (byte address) boundary) each consisting of one scan line through each of the 96 characters (so 8 tables of 96 byes each)
How fast can we go ?
The 'target' is to match the 'master' pixel clock for 640x480 VGA, which is 25.175 MHz (actually, we will have to aim at 25MHz, or some other simple integer OSC value
25MHz should be within the 'tollerance' of the VGA spec. The PIC18F specification OSC limit is 48MHz (either from an external 48MHz source or from a 12MHz crystal OSC which is fed to a 4xPLL to get 48MHz internally), however the CPU is OSC/4 = 12 MIPS (see note 1), although pushing this to 12.5MIPS should be no problem (Note 2) which would get us 320 pixels (assuming the PIC can 'keep up')
Note 1. All other PIC18F OSC options are slower - for example, the max. internal OSC that can be 'fed' to the 4x PLL is 8MHz, so after 4xPLL we get 32MHz, giving a CLK OSC of 8 MIPS (we can 'feed' the CPU circuit from internal 16MHz OSC, but that will get only 4 MIPS)
Nore 2. There should be no problem Overclocking to 50MHz using a 12.5MHz Xtal with the internal OSC and PLL circuits. If an external OSC is used, Overclocking may be possible up to 60MHz.
The best possible pixel rate is achieved with an external shift-register, uisng the PIC to output 8 pixels (one character scan line) at a time.
Using byte Table addressing, a 1280 instruction 'stack' of read and output instructions (terminated by pre-setting bthe Timer Counter to Interrupt), we can achieve 6 CLK's per byte. Assuming a 12.5 MHz CPU clock, and running the VGA system at 25MHz, the best we can get is 640 * 12.5/25 * 8/6 = 426 pixels i.e. 53 characters per line ...
Can we do better ?
The PIC18F range tops out at the 25K80-I/SP with a rated speed of 64 MHz and thus a 16MIPS CPU (it has 32 Kb instruction space and 4 Kb RAM in a 28pin DIP at 10off price of £2.05 (Farnell Element14, March 2016). This gives me 640 * 16/25 * 8/6 = 546 pixels i.e. 68 characters per line
Note that 8:6 is not a very good ratio for OSC frequencies If the PIC outputs a byte in 6 * 16MHz clocks, then the Shift Register pixel clock will be 8/6*16 = 21.333 MHz ....
The PIC 24F is also limited to 16 MIPS, however this is achieved at OSC/2 (so an external 32MHz OSC divided by 2 internally to get CPU clk). Pushing this to 50MHz (so we can get 25MHz pixel rate) seems an over-clock too far :-)
Only the PIC24EP, dsPIC and PIC32 series (none of which are cheap) can achieve 25 MIPS (or more) with their OSC/1 CPU clk operation.
What's the timing ?
Using a PIC18F at 12 MIPS means the 'bit clock' is going to be 12 MHz. At 12 MHz and a horizontal frequency 30 kHz (just within the VGA spec.) we get 400 clocks of which 320 are 'visible pixels' (the rest is sync timing) :-
Front porch 8 clk, sync 48 clk, back porch 24, leaving us with 320 visible pixels Note. If we want to fit 40 characters into 320 pixels, each character (with it's inter-character gap) is 8 pixels
The frame line counts stay the same at 525 total lines (480 visible lines) giving us a frame rate of 57.14Hz ('well within' the Dell VGA spec min. of 56Hz :-) ) Note. The lines of 8x8 font characters will be spaced apart by 2 lines, so 10 lines high and thus 480 scan lines gives us 48 lines of text.
Use a different PIC ?
The 24FJ64 GA002 (also about $2) has 8kb RAM and a 16bit CPU that runs at OSC/2 (rather than the usual OSC/4), so at the max. OSC of 32MHz it can achieve 16 MIPS. It also incorporates a shift-register (SPI circuit) that can be set to either 8 bit or 16 bit mode with an 8 deep Tx buffer, however (on internal clock) this is limited to half the CPU rate (so we are back to OSC/4 = 8MHz, i.e. 8Mbps) We can clock the SPI with an external clock, the Spec sheets says it's limited to 'less than' 16MHz (the 24FJ64 has 2 SPI ports - and they can be set to clock on opposite edges - so, at least in theory, we could achieve 24 Mpps - although working out how to 'interleave' the bits would be 'fun').
Note that the PIC10,12,16 and 18 series CPU's all run at OSC/4. The 24 series runs at OSC/2 except the PIC24EP which, along with the dsPIC30, 33 and PIC32 series all run at OSC. HOWEVER the EP parts (including the PIC24EP) all require 2 CPU clocks to address an i/o port (i.e. i/o 'bit banging' is limited to OSC/2) so they are not going to output data any faster.
Variable width characters
The problem with variable widths is that there is just no time to 'make decisions' to vary the character look-up/shape load timing. So, whilst 'variable width' looks nice, it's not really possible
If we store a 8x5 character shape font in a simple 8x8 matrix store, each 'width' has 3 'spare' bits. These 3 bits can be used to control the display width (for that character). To store the shapes of all ascii codes 0x20 (space) to ascii 0x7F (= 96 characters) will require 768 bytes out of the 16kb (8k instruction) program space
When 'load' is Lo, the shift register loads from it's parallel inputs on the next clk. The shape shift reg. will be loaded with 5 'char font' bits from the PIC (d7,6,5,4,3), with the other bits (d2,1,0) tied Lo (so they output 'space').
The serial output (q7) of the 'width' reg. will be wired to the 'load' inputs. This means input d7 of width must be tied Hi (so the 'load' is removed when the new width is loaded) whilst the other bits are loaded with the 'width'. The 'minimum' width will be 4 pixels (a 2 pixel character + a 2 pixel space), so d6,5 and 4 will also be tied Hi, leaving d3, d2 and d1 to be provided by the PIC. Load will go Lo on the first 0 to reach the shift out (q7). This gives us the following :-
d3 Lo = width 4, d2 Lo = width 5, d1 Lo = width 6, all d3-d1 Hi = width 7 (i.e. char width 5 + 2 inter char gap) since d0 will be tied Lo.
This gives us a minimum of 4 CPU CLK's to get the next data byte to the output shift register. In this time we need to look up the shape byte (i.e. fetch it) and copy the byte to the i/o pins. We then have to 'pause' for the current byte to be clocked to the shift register before loading the next byte ... we won't have time to count and test, so perhaps the PIC 'interrupt' system can provide the 'pause' ?
At this point we discover that Interrupt latency (delay) is 3 CLK cycles, i.e. exactly the same as a 'bit test skip / jump back if not found' wait loop. Further, there is just no way to 'look up' character shapes from a Table fast enough. In fact, it looks like we don't even have enough time to 'fetch' the shapes at all !
At this point it's clear that the limiting factor is going to be 'how fast' the PIC CPU can look up and deliver font pixels to the output circuits - and this depends totally on the PIC instruction set - and at 12 MHz we have one CPU clk per bit or 8 CLK's per character !
The first thing that becomes obvious is that we don't have the time to waste performing 'subroutine calls' on a 'per character' basis. Unfortunately, on the 18F series even the 'computed jump' ('Add to Program Counter') takes multiple clock cycles - for some non-obvious reason the PC is a 'byte' pointer (on the older PIC's it's an instruction pointer) - of course 18F instruction are 'word aligned' (some are 2 words) so you can only 'jump' to even addresses - plus (as usual) you can only jump within the current 'page' (in the 18F case that means 127 locations) - and (of course) it's not actually possible to directly access the Hi byte of the PC. It IS possible to access the subroutine return stack - but to place a computed address on the stack means getting the 'offset' - which is 2 bytes - and then overwriting the existing value on the stack before performing a Return.
Next I looked at the TBLRD instruction, which reads a byte from (a byte address in) program space and places the read byte in the TBLAT register, taking 2 CLK cycles to do so. You than have to move the data to the Accumulator before you can output it to the PORT pins. The table instruction includes an option to increment/decrement the address 'at no extra cost'. Of course it only accesses program space, so it is only useful for character shape look up.
The problem with all of 'look ups' is that we just don't have the time to be 'jumping' around the memory space - each jump costs 2 clocks and (at most) we have 8
Storing ascii codes and character shapes
Are we forced to use program space for shapes and RAM for ascii codes ? Well, in a word, yes - the PIC has some EEPROM, however storing (writing) one byte into EEPROM takes about 4mS (older PIC's take 10mS).
I may be possible to store both shapes and the ascii codes in program space - storing a 'block' of 8 or 16 bytes in Flash (with a 'erase and write' cycle) only takes about 2mS, and incoming ascii codes from the serial link (at 9600 baud) only arrive at about 1mS per byte. Of course doing this we will eventually run into the Flash write-lifetime limit.
Maximum data output rate (PIC18F14k50)
OK, so just how fast can we output 8 bits of shape to the external shift register ? Well, it takes 2 CLK's (load value to Accumulator, copy Acc to PORT = there is no 'load (literal) value to PORT' instruction (there is a copy Register to Register, but this takes 2 CLK's so it's the same as a Load Acc, Move Acc to PORT).
Next, how do we store the shapes ? Well, it actually turns out that a Table containing 'Return with value in Acc' instructions (each taking 2 CLK's) really is the fastest way. Calling the right location in that table is what takes time, specifically loading an offset, Call the table which then adds the offset and jumps to the location.
The problem is the number of instructions (and CLK cycles) needed - getting the offset is Move index to Acc, then we call the base of the table (Call baseAddr, 2 clks) where we add the offsets to the current PCL (ADDWF PCL = 2 more clks). Finally (after return with value in Acc (2 clks) and copy Acc to PORT) we have to check if any more characters need to be output, and, if so, increment the Indx before getting the next offset
To save the need for a 'N character count' loop, we can write the 'same' code N times. Not only does this eliminate the multiple instruction loop ('count decrement, test, loop back') but it also eliminates the 'increment offset pointer' since we can now address each of the 40 offset registers directly.
This gets us down to 7 clks :- Load offset N1 to Acc 1 clk Call tableLineN 2 clks (note - jump to a fixed location so we need 8 'stacks' pf 40 to scan the 8 lines) ; ADD Acc PCL, Return with Acc=shape 3 clks Copy Acc to PORT 1 clk ; and 40 more of the same (Load offset N2 to Acc ..)
If we are using an 8 pixel font i.e. 7 dot shape + 1 inter-character gap (which takes 8 clks to shift out), then we actually have to 'pad' this to 8 CLK's !
The above is 5 instructions (4 + NOP) per character per line scanned. Each line is 40 characters - so 200 instructions per 'stack' - and the 8x8 font is scanned 8 times to make up the shapes - so 1600 program instructions are dedicated to output. The font is 96 characters x 8 scan line = 768 'return with shape' store (plus 8 'ADD Acc PCL' at the start of each scan line) = 776 program instructions.
Note that each of the 8 sets, starting with 'ADD Acc PCL', must not be allowed to cross a '256 byte' (128 word) boundary.
With 1 line = 320 visible pixels, and 8 pixels per character, we get 40 'double width / height' characters per line. Allowing 2 pixel interline gap, means the 480 scan lines will be used to display 24 exactly lines of characters.
Line sync and frame sync
Whilst line sync seems easy enough (we could add the timing to the end of the '40 character output stack' = a Front porch 8 clk, sync 48 clk, back porch 24 (then 320 visible pixel clocks before the next sync), frame sync is slightly more complex. If we want to 'keep everything in step', the 'best' way to frame sync is to use one of the counter/timers with the Interrupt system
Of course it's not quite that simple - after 8 scan lines we have 2 'inter line gap' times to calculate the offsets for the next line of characters. However, more to the point, 24 lines of 40 chars requires 960 bytes - and the 18F14 only has 768 bytes, out of which we have to take the 40 offset bytes and a few 'temp' registers !!
Of course there is no point in storing lots of 'space' characters - instead we just store the actual characters to be displayed along with some sort of 'end of line' code (0x00 is common).
Since we don't have time to do anything clever during the actual output (i.e. during the 40 character 'stack') we have to program a 'line timer' that Interrupts the 'stack' after the 'last' character has been output for that line.
Dedicating 700 bytes to 'display' means, we have on average only (700/24 =) 29 bytes per line (rather than 40). Since the ascii codes are only 7 bit, storage efficiency can be improved by 'packing' 8 characters into 7 bytes OR by using the 'spare' bit as a 'space' flag
In typical English text, the average length of a word is 4.79 letters long (with 80% of all words between 2 and 7 letters long). This means that 80% of the time we are better off replacing a 'space' character with the 8th bit rater than packing 8 ascii characters into 7 bytes. On average, in 8 bytes, 'packing' saves 8 bits whilst 'space flagging' saves 8/4.79 * 7 = 11.7 bits, so 'on average' we are 3.7 bits 'better off' per 8 bytes.
How many characters per line ?
In both cases, an 'end of line flag' is still required, leaving 28 bytes for text. On average this can hold 28/4.79 = 5.8 words of text, which means (on average) the 28 bytes will contain 28 characters plus 5 spaces = 33 characters.
The alternative approach is to pack 8 characters into 7 bytes, allowing 28 bytes to contain 32 characters.
At what point do we 'run out of RAM'?
If each line is 28 bytes plus a 00 end of line flag, that means 29 bytes per line x24 lines means we have 697 bytes
If we assume a 'full lines' of 40 chars, then we don't need '00' flag. However things start to get complicated if we want to add a '-' when a word is 'split' by a line ending ...
Using a 'space flag' means 697 bytes can contain 697/4.79 = 145.5 words. Adding 145 spaces means we can display (697+145)/40 = 21 complete lines ('on average').
Packing 8 characters into 7 bytes means we can pack 40 characters into 35 bytes and 697/35 = gives us just under 20 'full' lines (actually, 19 lines at 35 = 665, leaving 32 bytes, however 1 byte now needs to be the 00 flag so we only get 31*8/7 = 35 characters in the last (20th) line.
There is not a lot to choose between the 'space flag' and '8 packed into 7' approach, however the 'space flag' is likely to be much simpler to 'decode' (we only have 2 scan line times between each character line to calculate the offsets for the next line of up to 40 characters) so this is the approach I decided on.
Frame timing will also be done using Interrupts. There are 525 total lines (of which 480 visible lines)
Using the 24FJ64GA002
Non-PIC users will know that the 20MHz ATmega1284p with it's 16kb RAM (and when over-clocked to 24MHz), will actually do a better job than the OSC/2 (16 Mhz) limited 24FJ64GA002 with only 8kb RAM. However the 24FJ64GA002 (plus an external shift reg) costs less than $2 whilst the ATmega1284p will cost you about $10
Limited to an OSC of 32MHz, the CPU clk runs at OSC/2, so we get 16 MIPS. In addition, it's a 16 bit CPU so each instruction should 'get a lot more done'. The allows a rather more ambitious goal - a full 80 characters x 48 line display !
PCL, the bottom bits of the Program Counter, is 16 bits wide. It even supports a 'computed branch' (allowing +/- jump using one of the 16 Accumulators) taking 2 CPU clocks.
To eliminate the call/return delay, we encode the 'output byte to PORT' in the actual FONT table as follows :-
Copy value to Acc ;get the shape 1 Copy Acc to PORT ; o/p to PORT 1 Copy wd RegN++ to Acc ; get the next offset from regN++ 1 Add Acc to PCL ; jump to next shape locn. 2
This only 'works' (i.e. results in fewer clk cycles) because PCL is 16 bits and we need to 'jump' by +/- 96 (number of ascii characters supported) times the instruction count (in the 24F case thats 5, so +/- 480 instructions). The 18F14 PCL is only 8 bits so we can't 'jump' in one instruction (it would be possible to use the Subroutine stack, however that takes even more instructions)
At 16MHz we achieve a 'bit rate (for an 8 bit font) of 16/5*8 = 25.6 MHz, which is 'fine' BUT how do we 'clock' this data out to the display ?
The problem is that the internal SPI shift register is limited to less than 16MHz (external clk, on internal clk it's even more limited = to 8MHz), so an external shift register is a 'must' to support a 24MHz pixel rate
If the CPU is running at 16MHz we need a 25.6 MHz pixel clock and there is no easy way to generate one (25.6MHz is not a common clock crystal value). However if we run the CPU at 15 MHz this gives us a pixel rate of 15/5 * 8 = 24MHz - and both 15MHz and 24MHz clock crystals are common.
Since the 24FJ64GA002 has 2 OSC circuits, we can run one with a 24 MHz crystal to get the pixel clock and the other for the CPU OSC. For the CPU clock, we actually need either OSC = 30MHz (so CPU OSC/2 = 15Mhz) or a 7.5MHz OSC (used with the internal PLLx4 this gives us 30 MHz, and again OSC/2 is 15MHz). Both 30MHz and 7.5MHz are also available, however it's always 'easier' to 'divide down' so 30 MHz crystal is the one to go for.
Actual display pixel drive will thus be from a general purpose parallel-in / serial-out shift register - such as the 74HC166 (cost is less than 25p each, 10 off, CPC or eBay (China)) - driven from the 24 MHz clock
The 74HC166 is a 'synchronous load' device, so we wire up a pair, using the second to control the 'sync load' for both (when the second loads, 7 of the bits load '0', the 8th = '1'. After 7 clocks at 24MHz, the '1' appears at the output and at the 'sync load' of both shift registers. On the 8th clock, both registers are loaded - the data register from whatever is on the PIC output pins, whilst the 'count' starts again)
What's the char set limit ?
The 'offset jumps' are +/-16 bits, however that's only for 'one scan' pass through the font (the base address is reset for each scan line pass). This means the 'width' of the Font table is limited to 15 bits of address, which is 16k (word) 'jump destinations'. The 24FJ64GA002 has 64kb of program space which allows a maximum of only 22,016 instructions - so there is no jump address limit to the Font size
A 96 character 8x8 font will need 5x96x8 = 3,840 words of programs space for it's 8 'scan pass' sub-tables (each table being one 'scan line' set of 480 instructions (each of the 96 entries consist of 5 instructions outputting the shape and then jumping to the next). A 'full' 256 ascii character set (8x8 font) would need 256 x5 x8 = 10,240 instruction words for it's tables (a full 256 8x10 fount would need 12,800)
Line and Frame sync
As with the 18F14, counter/timer Interrupts can be used to trigger Line and Frame Sync, although (unlike the 18F14) there will be no real need to 'interrupt' the pixel output 'stack' = the 24F has plenty of registers so we can just dedicate 160 registers to the 80 address word 'offsets' for this line and let the line run to the end.
Remember that the CPU clk is 15 MIPS ... so all PIC clocks are '5/8 ths' of the VGA clock
Line Sync Normal VGA Line timing is 800 clocks at 25.175 MHz = 31.78kHz (31.46875uS) PIC 24FJ Line timing 800 Clocks at 24 MHz = 30kHz (33.33uS) Normal Front porch 16 clk, Sync 96 clk, Back porch 48 clk Visible pixels, 640 clk = 640 pixels PIC 24FJ 15MHz clock is 5 CPU instructions per 8 'normal VGA' clk's (so Front porch 10 CPU clk, Sync 60 CPU clk, Back porch 30 CPU clk)
Ascii character storage
The full screen is 80 x 48 = 3840 characters. Whilst the 24FJ64GA002 has 8kb of RAM the higher character count means it makes even more sense to store only the actual characters for a line (at a cost of a 'line end' character) and to 'eliminate the spaces' (i.e. use the 8th bit to flag 'space'), or pack 8 characters into 7 registers. If we eliminate spaces, then, taking the average word length of 5, in a full display of 3,840 characters we can 'save' 768 registers (and thus expand the display 'history i.e. the 'scroll up' depth).
Packing 8 chars into 7 registers can save up to 548 registers but is rather more complex - we can only 'pack one' when the number of characters (remaining) in a line is 8 or more .. further, we still need an 'end of line' (so can't use that for packing).
If 0x00 is to be used as the 'end of line' flag, it 'makes sense' to use bit 7 = 0 to 'flag' a space. Space replacement can be done immediately a space character is received as follows :-
When a space (ascii 0x20) is received
If the line has started & If the last character bit7 = '1'
Then clear last char bit 7
Else store the character
Note that two successive spaces are encoded as (last char)+bit7 clear, ascii 0x20. 3 spaces will be (last char)+bit7 clear, (ascii 0x20)+bit7 clear.
Modifying the character set
The 24FJ64GA002 has the ability to 'self modify' it's own Flash memory program space. This allows the FONT to be modified 'as required', HOWEVER programming is not fast enough to allow 'on screen' effects.
The most obvious change would be to reprogram the character Font for 'wire frame' (B&W) graphics e.g. for use as an Oscilloscope display. Of course this means removing the inter-character-line gap, so less time to calculate the next 'character row' offset addresses (of course we can expand the Font to 8x10, allocate a second set of offset registers and do the calculations in the Line sync gap (a second set of offsets means we only have to calculate 80/10 = 8 new offsets per line sync and then 'switch sets' at the end of the current 10 line output)
Four colour display
If we switch from 24MHz x1 bit pixels to 12MHz x2 bit pixels, each pixel can be any of 4 colours but at 'half resolution' (i.e we only have 40 characters per line).
This might make sense when displaying wire-frame Graphics (Oscilloscope display), however with only 320 pixels per line the horizontal resolution is not going to be very good
If 00 = black, unless we make the text == green (in which case the other 3 colors red, green, bluereen, red, blue) rather than 11 = white, there is actually only a choice of 2 colors as we have to 'pair up' two of the 3 (R,G,B) wires (so the choices are Red with Cyan, Green with Magenta or Blue with Yellow)
RGB 000 Black 100 Red 010 Green 001 Blue 111 White 011 Cyan 101 Magenta 110 Yellow
How about sVGA ?
sVGA is 600x800, but we won't have time during the display of pixels to look up the shapes. This means we look up the shapes on one scan line and show them on the next i.e. all pixels are 2 lines 'high' (and 2 'dots' wide).
If our characters are 7 (double) scan lines high, and we allow 2 (double) scan lines between lines of text we have room for 600/(2x9) = 33 lines of text. If each character is 5 (double) pixels wide and we have 2 (double) pixels between characters, then we have room for 800/(2x7) = 57 characters per line. This nicely exceeds my 'target' (a 'retro' Teletype style display of 24 lines x 40 characters)
Next we have to consider the ascii character 'store'. Even the 'target' 24 x 40 is 960 characters - and 33x57 is 1881 - which immediately gives us a problem because we only have 768 bytes of RAM, at least 57 of which are needed to hold the 'pre=looked up' character shapes for a scna line display (and some more of which will be needed to hold control variables) !
Fortunately the typical 'command terminal' will very rarely be called on to display the full 40 (or 57) characters in every row. Whilst this might happen occasionally, eg when displaying the contents of a file that contains no line-breaks, it should still be acceptable to 'stop short' when the RAM runs out (anyone else remember how the old dsiplays 'paged' by pressing the keyboard 'space' bar ?) However, since RAM is plainly going to be a restriction, it's worth looking at ways to save space. First, since we only have room for 54 shapes, we might as well reduce the ascii code to 6 bits (64 char) - this lets us pack 4 characters (4x6 = 24bits) into 3 bytes (3x8 = 24 bits), however that might not be the most efficient approach. If we retain 1 byte per character then we have a couple of 'spare' bits with each. These can be used as 'flags', the most obvious of which is indeed a 'space saver' :-) (used to indicate that this character is the last in a word and that an interword 'space' should be inserted next). Of course ascii code zero (0x00) will be used to indicate 'end of line', so we don't have to waste RAM on lots of ASCII spaces
We have to be very careful with how the 'flags' are defined - the PIC will have very few clk cycles 'spare' as most will be used driving the display !
If we define the 'space' flag as 'top bit' == '0' then we have 'lots of time' (well, 1 space time) to discover that the other 7 bits are also zero and we have reached the 'end of' line' with a 0x00 NULL byte - or not and we have to fetch the next byte.
Can we do something similar during scanline character shape out put ? Well 5 pixels wide +2 inter-character gap = 7, so each of the 56 scanline shape registers has 1 bit 'spare'. If we pack 8 shapes in 7 registers, this saves us 7 registers overall, however the 'packing' will requires quite a few CPU cycles. Better is to 'map' the shapes to the ascii characters i.e. use the top bit as a 'space (follows) flag' (and 0x00 as the 'end of line') This means, fo course, that the output shift (SPI) circuits have to support '7 bit' characters and we have to load at 7 pixel intervals (rather than 8)
The pages in this topic are :-
Next subject :- CCTV controller - (using the PIC10F206)