8x8 Multiply (shift and ADD) Subroutine

You can find many 8x8 multiply methods (for example, see piclist math methods), however implementation is typically 'left up to the user' :-). Generating 'optimum' code depends on your goal, either max. speed or minimum code space. My 'new PIC 33 instruction set (macros)' (above) contains a MUL macro that implements a 'maximum speed' approach, so this subroutine is aimed at 'minimum code space' approach (i.e. it takes longer because it loops). Method The basic 'shift and add' method uses one value to control the add of the second into the 'top' result, after which the result is shifted down 1 bit (so any Cy from the ADD is shifted into the msb) and the next control bit is checked. The sequence always ends on a final shift. The subroutine will multiply 2 8bit values, passed in a register pair. To save on register space, the same register pair will be used to hold the (16bit) result. To control the loop, rTemp is used = DECFSZ is the fastest possible way to control a loop as it's a single instruction to 'Decrement and skip if zero', however it must be combined with 'Jump loop' i.e. be positioned at the end of the loop. Start by setting up the loop, then copy mRegHi to Acc, which frees the register for reuse To start the loop, shift mRegLo to get lsb into Cy Do the ADD (if Cy) Shift the result (puts next lsb into Cy) Dec the loop coutn and loop if nonZero Return with 16 bit result. Note that the final bit shifted out of mRegLo is ignored == so we don't care what's shifted in at the start ; MULTIPLY ;Unsigned multiply subroutine ; Called with the multiplier (control) in mRegLo, multiplicand (ADDed) in mRegHi ; Returns with result mRegHi,mRegLo. ; Acc and rTemp (count) are used LOAD 0x0F ;loop count COPY Acc,rTemp ;set count COPY mRegHi,Acc ;get Hi to Acc CLR mRegHi ;clear for reuse RRF mRegLo,1 ;lsb b0 to Cy (don't care what's shifted in as it will be discarded at end) mLoop ;arrive here with Cy set for ADD (loop is 8*7 = 56 CLK's) Skip nCy ;skip no carry ADDWF mRegHi,1 ;Cy, add (Acc) to msb, may set new Cy RRF mRegHi,1 ; nCy (skip add) or cy from add to b15 RRF mRegLo,1 ; .. b7 is msb b0, b0 to Cy (last shift is discarded) DECFSZ rTemp ;dec count (no effect on Cy) JMP mLoop ;nZ, keep looping (no effect on Cy) RETURN ;Z=exit, all done Total 12 instructions, 5+ 56 +2(return) = 63 CLK cycles (compared to MUL macro which is 33/35 CLK's (Acc not saved/saved)). At the cost of a few extra instructions, multiply can be 'short circuited' with tests for 0 (and/or 1), however this adds overhead to ALL multiplies so is only 'worth it' if your application is likley to result in many *0 or *1 cases