<h3>8x8 Multiply (shift and ADD) Subroutine</h3>

You can find many 8x8 multiply methods (for example, see <a href="http://www.piclist.com/techref/microchip/math/basic.htm">piclist math methods</a>), however implementation is typically 'left up to the user' :-). Generating 'optimum' code depends on your goal, either max. speed or minimum code space.

My 'new PIC 33 instruction set (macros)' (above) contains a MUL macro that implements a 'maximum speed' approach, so this subroutine is aimed at 'minimum code space' approach (i.e. it takes longer because it loops).

<b>Method</b>

The basic 'shift and add' method uses one value to control the add of the second into the 'top' result, after which the result is shifted down 1 bit (so any Cy from the ADD is shifted into the msb) and the next control bit is checked. The sequence always ends on a final shift.

The subroutine will multiply 2 8bit values, passed in a register pair. To save on register space, the same register pair will be used to hold the (16bit) result.

To control the loop, rTemp is used = DECFSZ is the fastest possible way to control a loop as it's a single instruction to 'Decrement and skip if zero', however it must be combined with 'Jump loop' i.e. be positioned at the end of the loop.

Start by setting up the loop, then copy mRegHi to Acc, which frees the register for reuse
To start the loop, shift mRegLo to get lsb into Cy
 Do the ADD (if Cy)
 Shift the result (puts next lsb into Cy)
 Dec the loop coutn and loop if nonZero
Return with 16 bit result.

<i>Note that the final bit shifted out of mRegLo is ignored == so we don't care what's shifted in at the start</i>

<code>
;<b>
MULTIPLY         ;Unsigned multiply subroutine</b>
; Called with the multiplier (control) in mRegLo, multiplicand (ADDed) in mRegHi
; Returns with result mRegHi,mRegLo.
; Acc and rTemp (count) are used
 LOAD 0x0F       ;loop count
 COPY Acc,rTemp  ;set count
 COPY mRegHi,Acc ;get Hi to Acc
 CLR mRegHi      ;clear for reuse
 RRF mRegLo,1    ;lsb b0 to Cy (don't care what's shifted in as it will be discarded at end)
mLoop           ;arrive here with Cy set for ADD (loop is 8*7 = 56 CLK's)
 Skip nCy        ;skip no carry
 ADDWF mRegHi,1  ;Cy, add (Acc) to msb, may set new Cy
 RRF     mRegHi,1         ; nCy (skip add) or cy from add to b15 
 RRF     mRegLo,1         ; .. b7 is msb b0, b0 to Cy (last shift is discarded)
 DECFSZ rTemp    ;dec count (no effect on Cy)
 JMP mLoop       ;nZ, keep looping (no effect on Cy)
 RETURN	         ;Z=exit, all done
</code>

Total 12 instructions, 5+ 56 +2(return) = 63 CLK cycles (compared to MUL macro which is 33/35 CLK's (Acc not saved/saved)).

<i>At the cost of a few extra instructions, multiply can be 'short circuited' with tests for 0 (and/or 1), however this adds overhead to ALL multiplies so is only 'worth it' if your application is likley to result in many *0 or *1 cases</i>