Posts Tagged ‘Coding Corner’

PIC-16 Bitwise Long Divide

Saturday, May 15th, 2010

I promised I’d post my long divide function for the PIC. There was a resounding silence — no surprise considering most if not all the comments to date have been spam.

I just hope the next poor slob who needs an assembly language divide function for the PIC can find this page with a web search.

So… the function I wrote for the PSoC part is pretty universal. You should be able to port it to your favorite 8-bit part with minimum fuss. However, the PIC-16 and PIC-14 micros have a really funky hole in their operation set: they don’t do add with carry (or subtract with borrow). I have no objection to RISC architecture, though I’m not entirely convinced that it’s inherently better than CISC — I simply have no real opinion on the subject — but come on, guys. You’ve gotten carried away with this one.  Add with carry? Just how crazy are you trying to be? It’s not a CISC operation and it would have made multi-byte arithmetic considerably simpler. This is probably my single biggest complaint about the PIC micros.

So I’ve created a separate PIC port for this routine, and I’m posting it here separately, because it’s a pain in the neck to port the PSoC version.

The other thing I’ve done here is taken advantage of Microchip’s much more extensive (read: “not brain dead”) preprocessor and created a version of the code where you can choose (at compile-time) how many bytes the dividend and divisor occupy. This makes it simple to port to however many bits of resolution you need, as long as it can be expressed in multiples of 8.

Enjoy!

;
; Copyright 2009 Jeffrey J. Nonken
; License: permission granted for unlimited use and unlimited derived use.
; If you use this, I'd appreciate it if you'd give me credit.
; Just leave these comments intact or mention me in your comments. I don't need
; money for this, but I'd appreciate credit where it's due.
; (Though if you want to send me money, I won't refuse it. :)
;
; Like what I did here? Feel free to offer me a job. http://jeffrey.nonken.net/
;
; Let me know if you have any improvements. PLEASE let me know if you find bugs.

;
; This is an unsigned integer divide function, x bits divided by y bits.
; Stuff your values into "dividend" and "divisor" and call the function (div24by16 in this sample).
; Note that "dividend" is stored as an x+1-byte value, the high byte is used internally for overflow.
; Any value you place here will get stomped on. Nothing is preserved. If you need the dividend or
; divisor later, make sure they're stored someplace else too.
; The remainder is computed as a side-effect of the divide but we do not attempt to preserve it.
; A few lines of extra code could be written to return the remainder in "divisor", but it will take
; some shifting around to justify the value properly. Use the inverse of "mask" on the result in
; "dividend" and the value calculated in "shift_count" to figure that one out. Don't forget to
; include the overflow bit.
;
; I originally created this algorithm in the 1980s to work on an 8086 processor.
; Later I ported it to the Cypress M8 processor for 24 over 16. (That code is in the Times Square Ball, by the way.)
; In order to avoid some tricky stuff on the PIC 16 series processor I ported it to "C", ran a test on a simulator,
; debugged it, then got the PICC C compiler to generate assembly code for 16-over-16 bit. I hand-optimized the result.
; After porting it back to 24-over-16 bits I decided I could use constants and macros to create something that
; could handle any number of bytes for the dividend and divisor -- between compiles, I mean, not run-time.
; (That could happen too but it would be much less efficient. I already did a couple compromises just for this.)
; This is what's left.
;
; So set your dividend and divisor size (in bytes) at the start of this, change the function name to something
; appropriate, and import it into your project. Just make sure the dividend has as many bytes as the divisor,
; or more, when you pick DIVIDEND_SIZE and DIVISOR_SIZE. No, that doesn't refer to the values you pass to
; the function -- if dividend is smaller than divisor at run-time you'll just get zero as a quotient.
;
; I wasn't using the PICC C compiler to write the code for the project because it's not well-enough integrated
; into MPLAB to do simulation or emulation. I used Microchip's C18 compiler set to an 18F4455 MCU for the
; test/debug cycle, then used PICC C to generate the assembly.
;
; It's not pretty, but it works.
;

        INCLUDE defaults.inc

; Set your word sizes here.
DIVIDEND_SIZE   equ     3               ; Dividend size in bytes. Also the quotient size.
DIVISOR_SIZE    equ     2               ; Divisor size in bytes.
; End of configurable equates.

 if (DIVIDEND_SIZE < DIVISOR_SIZE)      ; One caveat: dividend can't be smaller than divisor.
        error   "DIVIDEND_SIZE must be greater than or equal to DIVISOR_SIZE!"
 endif
MASK_SIZE       equ     DIVIDEND_SIZE   ; Number of bytes in the mask the same as dividend.
DIVISOR_MSB     equ     DIVISOR_SIZE-1  ; MSB offset for the divisor.
DIVIDEND_MSB    equ     DIVIDEND_SIZE-1 ; MSB offset. Doesn't include the overflow byte.

        UDATA

        GLOBAL dividend,divisor,quotient
quotient:                               ; Outgoing quotient is in dividend. Doesn't have to be, but if not,
                                        ;  you'll have to pre-clear the quotient and shift it separately.
                                        ;  I don't think you'll save anything doing that.
dividend:       RES     DIVIDEND_SIZE+1 ; Incoming dividend, outgoing quotient.
                                        ;  We reserve an extra byte for overflow (1 bit).

divisor:        RES     DIVISOR_SIZE    ; Incoming divisor. This gets destroyed too. Just shifted, actually.
mask:           RES     DIVIDEND_SIZE   ; Mask used to separate quotient from remainder.
shift_count:    RES     1               ; Number of shifts required to complete the operation.
                                        ;  Also number of bits in the quotient mask.
mask_count:                             ; Counter used to generate the mask.
                                        ;  (Re-use carry, we don't use both at one time.)
carry:          RES     1               ; Holding register for the carry bit for multi-byte arithmetic.

        CODE

;
; This function tries to be efficient by shifting both values to the far left to minimize the number of repititions
; through the main loop. That comes at a cost, of course, but I expect that most of the value pairs I use will do
; better this way.
;
; Here's the original comment block, modified. Note that it was written for 24x16 divide.
;----------------------------------------------------------------------------------
; Divide a 24-bit number by a 16-bit number. Leaves the quotient in the dividend
; (numerator) and throws away the remainder.
;
; This works like a classic long division; it shifts the numerator and denominator
; to the top of their respective words (so they are the same apparent magnitude), counts
; the number of shifts and uses it to calculate the number of subtracts needed as
; well as generate the mask that eliminates the remainder.
;
; This can easily be extended or reduced to accomodate various word sizes. There
; should be no problem dividing a smaller word by a larger, but you will have
; to virtually extend the numerator with zeros. Since the initial shift factor will
; start out negative you will need to fiddle with that; you do not want to quit
; if you generate a carry, but instead test the sign of the result. (You still will
; want to jump to .zero_quotient if the shift factor is negative. Note that a shift
; factor of zero is valid.) [This version tests for zero, not carry, so that's
; no longer true. Lowest valid shift factor is 1. Don't try to use a smaller dividend
; that divisor unless you want to modify the code. I doubt it's worth the trouble.]
;
; If you are expecting small values in the numerator or denominator, you can
; check the high byte for 0 and do 8-bit shifts before doing the final 1-bit shifts.
; Since the 8-bit-shift loop starts with a cmp (or tst) and a jump, the cost is
; only about 13 cycles if it is not necessary. [Cypress cycles.]
;
; If you need to keep the remainder it gets a bit more complicated. The calculated
; count, and the final calculated mask, will tell you where the remainder is in
; the numerator: the remainder's LSb is just above the highest 1-bit in the mask.
; You may need to copy it out, mask and shift it before saving the quotient. You
; can also contrive to save the quotient into a different word rather than into
; the numerator, in which case you only need to shift the remainder back around.
; However, this will cost you more shifts during the calculation (you need to
; shift the numerator and the quotient separately).
;
; You can save yourself several bytes of RAM storage by calculating the final
; mask on-the-fly.
;----------------------------------------------------------------------------------
; Please note several things.
; 1) This is little-endian. Stuff/retrieve the LSB of the value in the lowest
;    address of the dividend/divisor/quotient.
; 2) This is entirely, completely, 100% unsigned. Want signed? Write your own,
;    or preserve the signs and convert to absolute values before the divide,
;    calculate the sign of the quotient and complement if required. If you need a
;    reminder: the quotient's sign will be the exclusive-or of the two inputs'
;    signs. The remainder, should you choose to keep it, is the same sign as the
;    dividend.
; 3) Make sure you set (or clear) ALL the input bytes (dividend and divisor)
;    before calling. That should be obvious, but it's worth the reminder.

        GLOBAL div24by16
div24by16:
; Check for boundary conditions and set up the divide.
        clrf    dividend+DIVIDEND_SIZE  ; Clear the extra byte.
; Look for divide by zero.
        movf    divisor,w               ; See if the divisor is zero. Start with the LSB.
i = 1
        while i < DIVISOR_SIZE
        iorwf   divisor+i,w             ; OR in the rest of the bytes.
i = i + 1
        endw
        skipz                           ; Any 'on'-bits will show here.
        goto    divisor_not_zero        ; Situation normal, keep going.

; Oh noes! Divide by zero. Can't have that.
        movlw   0FFh                    ; Divide by zero? Get as close to infinity as we can.
i = 0
        while i < DIVIDEND_SIZE
        movwf   dividend+i
i = i + 1
        endw
        return;

; Passed the first hurdle: we have a non-zero divisor.
divisor_not_zero:
;divide.c: 62: }
;divide.c: 63: shift_count = 1;
                                        ; Shift-count is the difference in the number of bits, plus 1.
        movlw   1 + ((DIVIDEND_SIZE-DIVISOR_SIZE) << 3)
        movwf   shift_count
;divide.c: 64: while ((divisor & 0x8000) == 0)
        goto    divisor_shift_start

; Start with zero. Is the dividend larger? Add 8 bits for every byte larger.
; So in a 24x16 divide we start with 8.
; Add 1 for every time we shift the divisor left.
; Subtract 1 for every time we shift the dividend left.
; Plus one so we can use decfsz for the loop test. (Otherwise we have to test for borrow.)
divisor_shift_loop:
;divide.c: 65: {
;divide.c: 66: ++shift_count;
        incf    shift_count,f           ; Count up every time we shift the divisor left.
;divide.c: 67: divisor <<= 1;
        clrc
i = 0
        while i < DIVISOR_SIZE
        rlf     divisor+i,f             ; Shift all bytes left, starting with the LSB.
i = i + 1
        endw
divisor_shift_start:
        btfss   divisor+DIVISOR_MSB,7   ; Stop shifting once the high bit is non-zero.
        goto    divisor_shift_loop

;divide.c: 68: }
;divide.c: 71: while ((dividend & 0x8000) == 0)
        goto    dividend_shift_start

; OK, now shift the dividend, but this time count backwards on the shift count.
dividend_shift_loop:
;divide.c: 72: {
;divide.c: 73: if (--shift_count == 0)
        decfsz  shift_count,f                   ; If it reaches zero that means the divisor is larger!
        goto    shift_not_zero                  ; If that happens, our quotient is guaranteed to be zero.
; Oops. The shift value reached zero, which means the dividend's value is smaller than the divisor's,
; which means the quotient is zero. If you want to return the remainder you'll have to stick in some
; extra code here.
        ;divide.c: 74: {
;divide.c: 75: dividend = 0;
i = 0
        while i < DIVIDEND_SIZE
        clrf    dividend+i                      ; Quotient is zero. Just clear it out the easy way.
i = i + 1
        endw
        return                                  ; Render the output and return.
; Still some meat on this, so keep going.
shift_not_zero:
;divide.c: 77: }
;divide.c: 78: dividend <<= 1;
        clrc                                    ; Clear the carry so as not to contaminate the value.
i = 0
        while i < DIVIDEND_SIZE
        rlf     dividend+i,f                    ; Shift all dividend bytes left starting with the LSB.
i = i + 1
        endw
                                                ; We'll never shift anything into dividend+DIVIDEND_SIZE in this loop.
dividend_shift_start:
        btfss   dividend+DIVIDEND_MSB,7         ; See if the high bit is set.
        goto    dividend_shift_loop             ; If not, repeat until it is.
;divide.c: 79: }
;divide.c: 80: {
;divide.c: 81: unsigned char mask_count = shift_count;

; Now that we have shift_count, we know how big the quotient is. Use that to generate a mask.
; If you want to save RAM you can generate the mask on-the-fly just before returning.
; Just make sure you copy off mask_count first, because we destroy shift_count.
        movf    shift_count,w
        movwf   mask_count                      ; shift_count is guaranteed to be at least 1.
;divide.c: 82: mask = 0;
i = 0
        while i < MASK_SIZE
        clrf    mask+i                          ; Start by clearing out the mask.
i = i + 1
        endw
;divide.c: 88: while (mask_count)
mask_shift_loop:
;divide.c: 89: {
;divide.c: 90: mask = (mask << 1) | 1;

; (Notice we don't clear the carry first. Don't need to.)
i = 0
        while i < MASK_SIZE
        rlf     mask+i,f                        ; For every count, shift left by 1...
i = i + 1
        endw
        bsf     (mask),0                        ;  ... and set the low bit.
;divide.c: 91: --mask_count;
        decfsz  mask_count,f                    ; Count down and repeat ad nauseum.
        goto    mask_shift_loop
;divide.c: 92: }
;divide.c: 93: }
;divide.c: 94: do

; Hey, guess what? After all that we finally get to do some division! Hurrah!
looping:
;divide.c: 95: {
;divide.c: 96: if (divisor <= dividend)
; Remember your grade-school long division. Line up the values on the left. Can you subtract?
; If not, shift the divisor right 1 digit and try again. Of course, that's in base 10, so you need to
; find a multiplier to put up top. Easier for us here, it's just zero or 1. Once you do that
; multiply the divisor by the multiplier and subtract the result from the dividend.
; Move to the next column and repeat until you run out of dividend.
; Whatever is left is the remainder.
; So: Find out if we can subtract. If we can, do it and stick a 1 in the quotient, then shift the dividend left.
; If not, just shift the dividend left. Note that if we do that, we might end up with an extra bit off the high end.
; That's why we reserved an extra byte for the dividend.
        movf    (dividend+DIVIDEND_SIZE),w      ; First, see if the overflow bit is set.
        skipz
        goto    dividend_is_bigger              ; If dividend overflow byte is non-zero, dividend is bigger by definition.
                                                              ; Otherwise drop through to compare the two values.
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; Can't really make a local label in a while/endw (assembler limitation) so we're making a macro to call.
        sub_byte        macro
        local next
        movf    (divisor+j),w
        subwf   (dividend+i),w
        skipnz
        goto    next                            ; If they're the same, drop through.
                                                ;  Either try the next one or if this is the last, allow the subtraction.
        skipnb                                  ; They're not the same, see which one is bigger.
        goto    divisor_is_bigger
 if j > 0                                       ; If this is the last byte, just drop through.
        goto    dividend_is_bigger
 endif
next:
        endm
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

; Start from the highest byte and compare. It's just a gt/lt/eq comparison so we don't need borrows and so on.
; As soon as we see one is larger than the other we can stop. If they're equal it's OK to do the subtract,
; so we treat that the same as the dividend being larger.
i = DIVIDEND_MSB
j = DIVISOR_MSB                                 ; Start with the most significant byte of each value.
        while j >= 0                            ; Keep trying until we run out of divisor.
        sub_byte                                ; Run the macro (above). Any difference jumps out of the while/endw "loop".
j = j - 1
i = i - 1                                       ; Still the same, try the next byte.
        endw
; Both values being the same is OK too. Drop through to dividend_is_bigger.

; If we get here, either the dividend is bigger or they're exactly the same. Either way we can do the subtraction.
dividend_is_bigger:
;divide.c: 97: {
;divide.c: 98: dividend = ((dividend - divisor) << 1) | 1;
j = 0                                           ; Start the divisor index at zero.
i = DIVIDEND_MSB - DIVISOR_MSB                  ; Count backwards from the dividend MSB so we do the same number of bytes.
        while j < DIVISOR_SIZE ; First, take care of the borrow from the previous byte.
 if j > 0                                       ; No borrow waiting on the first byte!
        movf    carry,w                         ; Get the borrow from the previous byte.
        subwf   dividend+i,f                    ; Subtract it from the current byte.
        clrf    carry                           ; Clear the borrow...
        skipnb
        incf    carry,f                         ;  ...and set it again if we have a new one.
 else
        clrf    carry                           ; First byte, just clear the borrow.
 endif
 ; Now do the subtract.
        movf    divisor+j,w                     ; Get the divisor,
        subwf   dividend+i,f                    ; subtract from the dividend.
        skipnb
        incf    carry,f                         ; Add to the borrow.
                                                ; The borrow, "carry", will be 0 or 1 here, carried forward to the next byte.
j = j + 1
i = i + 1                                       ; Move on to the next byte.
        endw
        movf    carry,w
        subwf   (dividend+DIVIDEND_SIZE),f      ; Extend the borrow to the overflow byte.
i = 0
        while i <= DIVIDEND_SIZE
        rlf     dividend+i,f                    ; Now shift the dividend left for the next subtract.
i = i + 1
        endw
        bsf     (dividend+0),0                  ; This subtract succeeded, so record a "1" bit in the quotient.
;divide.c: 99: }
        goto    loop_continue

; If we get here the subtract failed; just shift the dividend left and record a "0" bit in the quotient.
; This is where the dividend overflow byte comes into play. We wouldn't need it otherwise.
divisor_is_bigger:
;divide.c: 100: else
;divide.c: 101: {
;divide.c: 102: dividend = (dividend << 1);
        clrc                                    ; Clear the carry ahead of time (that "0" we mentioned above).
i = 0
        while i <= DIVIDEND_SIZE                ; (Note this says "less-than-or-equal" to accomodate the overflow.)
        rlf     dividend+i,f                    ; Shift all the bytes left, including the overflow.
i = i + 1
        endw

; Decide whether we're finished the divide, and loop if not.
loop_continue:
;divide.c: 103: }
;divide.c: 104: } while (--shift_count);
        decfsz  shift_count,f                   ; Count down, skip if finished.
        goto    looping                         ; Not finished, start another iteration.

; Hey, we're finally finished! Strip away the remainder and return.
; The mask can be generated on-the-fly here if desired to save RAM, though it might cost cycles.
; Then again, maybe not.
; If you want to return the remainder, now's the time.
;divide.c: 105: dividend &= mask;
i = 0
        while i < DIVIDEND_SIZE
        movf    mask+i,w
        andwf   dividend+i,f                    ; Strip away the remainder, leaving only the quotient.
i = i + 1
        endw
;       clrf    dividend + DIVIDEND_SIZE        ; Clear the overflow byte. Probably redundant.
;divide.c: 106: }
        return

        end

PSoC M8 Bitwise Long Divide

Saturday, March 6th, 2010

One problem with 8-bit micros is that their math processing is usually limited to addition and subtraction. You can do a lot with addition and subtraction, but sometimes you need to do multiplication and division. Even the mighty 8086 didn’t get its FPU built in until the 80386 (and it was optional until the Pentium, if I remember correctly) though you could get an external FPU.

Of the two processors I use most (the Microchip PIC18 and the Cypress M8) each has an 8-bit multiply function accessible via registers (though the M8 is a signed multiply — who was the genius who thought that up? It makes a mess of doing multi-word multiplies). And multiply is comparatively simple to write.

But to do a proper divide requires more work. Especially if you don’t want to just subtract the divisor from the dividend and add 1 to the quotient until you run out of dividend. It works, but it’s terribly slow.

I’m not very strong on math so I fell back on what I knew. What I knew was long division as taught to me in elementary school. You know — subtract a multiple of the divisor from the top of the dividend, record the multiplier, calculate the remainder, then do it again on the remainder. Wash, rinse, repeat until you run out of dividend or patience. (OK, not the best description, but you already know the drill.)

So that’s what I did for my bitwise long divide. I first wrote it for the 8086, later I ported it to the M8, then the PIC16, then the M8 again when I lost the first code. I’ve ported it to several different sizes of dividend and divisor; with some work I could probably make it run-time adaptable but of course that would only slow it down. I did make a macro version for the PIC16 that uses preprocessor directives (while/endw mostly) to let you generate code for any size words you like. I don’t think I can do that for the M8, their preprocessor is very primitive. The original M8 code is in the Times Square Ball firmware, by the way. What I’ll be giving you has been ported twice since then, but it’s essentially the same. (I’ll give you the PIC16 code in a separate blog post, fear not.)

Here’s the general idea. First, shift both values (dividend and divisor) left until each is at the top of its word. Keep track of how many times you shifted each (plus the difference in word size); that tells you how many bits will be in your quotient (and indirectly your remainder), though it doesn’t say how many of those bits are set to 1, don’t get me wrong. But the quotient will never be any larger than that number of bits. That number also tells you when to stop subtracting; it’s a counter.

Next you do a subtract-test-record-shift cycle exactly like the one in your by-hand long division, except that ours is in binary. You are shifting the dividend, not the divisor (which stays constant).

Now here’s the tricky bit. Every time you do a subtract-shift you discard one bit of significance. That means that a new low bit becomes available in the dividend on each cycle. So instead of putting that bit into a separate quotient (and requiring a second set of shifts — remember that we’re in a slow, 8-bit processor), you can set the result of your subtract into the low bit of the dividend after you’ve shifted it. Of course that means you can’t (easily) test for zero to see if you’re through; that’s why you need that counter. Still, leaving the divide early due to the dividend going to zero is an unusual case, I should think. (And if it is not, in your particular case, due to peculiarities of your particular data set, then you obviously want to optimize your code to take advantage of that.)

That leaves the quotient in the dividend. But what about the leftover bits after the last subtract? Well… that’s your remainder.

Now early on we determined how many bits the quotient would have. We use this to generate a mask that lets us preserve the quotient and throw away the remainder. But what if you need the remainder? That’s a bit more work but it turns out to be exactly the number of bits remaining that aren’t quotient. All you have to do is copy, mask, and shift right by — guess what? — the number of bits in the quotient, the same counter we calculated up front. (Actually you can skip the ‘mask’ step for the remainder since you’re going to shift the quotient into the bit bucket anyway.)

Cool stuff, Maynard.

Unfortunately I haven’t figured out how to attach files to this blog (except media). I can of course upload elsewhere and link to it. Meantime I’m going to try to simply paste it into the blog.

Bonus! Here I am giving you two different word sizes, 24/16 and 32/16. The differences should be obvious; it should help if you ever need to port to other word sizes. Please note however that the design doesn’t really allow for the divisor to have more bits than the dividend.

One other note in reference to my post “Optimize This!” regarding re-use of variables. Note that I do that here. First, it’s stable, test code; second, you’ll notice that I gave the variable two different, meaningful labels for the two separate uses; third, you’ll notice that I commented it.

On the other hand, I didn’t really comment the code as much as I usually do. I did however leave in the C code (as comments) I used during one porting process.

One last note: notwithstanding the number of bits we counted, when you store the results the quotient word size is the same as the dividend and the remainder is the same word size as the divisor. Again, that’s the general case and your dataset may generate smaller words, optimize for your dataset.

;
; Copyright 2010 Jeffrey J. Nonken
; License: permission granted for unlimited use and unlimited derived use.
; If you use this, I'd appreciate it if you'd give me credit.
; Just leave these comments intact or mention me in your comments. I don't need
; money for this, but I'd appreciate credit where it's due.
; (Though if you want to send me money, I won't refuse it. :)
;
; Like what I did here? Feel free to offer me a job. http://jeffrey.nonken.net/
;
; Let me know if you have any improvements. PLEASE let me know if you find bugs.

;
; This is an integer divide function, 24 bits divided by 16 bits.
; Stuff your 24-bit value into "dividend", 16-bit value into "divisor" and call div24by16.
; Note that "dividend" is a 4-byte value, the high byte is used internally for overflow.
; Any value you place there will get squashed first thing.
; The quotient of the divide will be left in dividend, so make sure your value is safe someplace else.
; For that matter, I destroy divisor, too.
; The remainder is computed as a side-effect of the divide but we do not attempt to preserve it.
; A few lines of extra code could be written to return the remainder in "divisor", but it will take
; some shifting around to justify the value properly. Use the inverse of "mask" on the result in
; "divisor" and the value calculated in "shift_count" to figure that one out. Don't forget to
; include the overflow bit.
;
; I originally created this algorithm in the 1980s to work on an 8086 processor.
; Later I ported it to the Cypress M8 processor for 24 over 16. [That code is in the Times Square Ball, by the way.]
; In order to avoid some tricky stuff on the PIC 16 series processor I ported it to "C", ran a test on a simulator,
; debugged it, then got the PICC C compiler to generate assembly code. I hand-optimized the result.
; This is what's left.
;
; I wasn't using the PICC C compiler to write the code for the project because it's not well-enough integrated
; into MPLAB to do simulation or emulation. I used Microchip's C18 compiler set to an 18F4455 MCU for the
; test/debug cycle, then used PICC C to generate the assembly.
;
; It's not pretty, but it works.
;

    AREA	bss [RAM, REL]

quotient::					; Outgoing quotient is in dividend.
dividend::		BLK	5		; Incoming dividend, outgoing quotient.
divisor::		BLK	2		; Incoming divisor. This gets destroyed.
mask:			BLK	4		; Mask used to separate quotient from remainder.
shift_count:		BLK	1		; Number of shifts required to complete the operation.
                                                ;  Also number of bits in the quotient mask.
mask_count:					; Counter used to generate the mask.
                                                ;  [Re-use carry, we don't use both at one time.]
carry:			BLK	1		; Holding register for the carry bit. Used in the only
                                                ;  true 3-byte operation we have.

    AREA text [ROM, REL, CON]

;
; This function tries to be efficient by shifting both values to the far left to minimize the number of repititions
; through the main loop. That comes at a cost, of course, but I expect that most of the value pairs I use will do
; better this way.
;
; Dividing 1024 by 33 takes about 345 cycles, including the call/return. Just for the record.
;
; Here's the original comment block. Note that it was written for 24x16 divide.
;----------------------------------------------------------------------------------
; Divide a 24-bit number by a 16-bit number. Leaves the quotient in the dividend
; [numerator] and throws away the remainder.
;
; This works like a classic long division; it shifts the numerator and denominator
; to the top of their respective words [so they are the same magnitude], counts
; the number of shifts and uses it to calculate the number of subtracts needed as
; well as the mask that eliminates the remainder.
;
; This can easily be extended or reduced to accomodate various word sizes. There
; should be no problem dividing a smaller word by a larger, but you will have
; to virtually extend the numerator with zeros. Since the initial shift factor will
; start out negative you will need to fiddle with that; you do not want to quit
; if you generate a carry, but instead test the sign of the result. [You still will
; want to jump to .zero_quotient if the shift factor is negative. Note that a shift
; factor of zero is valid.] [This version tests for zero, not carry, so that's
; no longer true. Lowest valid shift factor is 1.]
;
; If you expect to be using long words or want to macro-ize this [or expect to
; adapt it a lot] you may want to use multi-byte words and index into them
; [e.g. [numerator+1] instead of discrete names for each byte. However, the result
; will be endian-dependent. [This one is little-endian.]
;
; If you are expecting small values in the numerator or denominator, you can
; check the high byte for 0 and do 8-bit shifts before doing the final 1-bit shifts.
; Since the 8-bit-shift loop starts with a cmp [or tst] and a jump, the cost is
; only about 13 cycles if it is not necessary. [Cypress cycles.]
;
; If you need to keep the remainder it gets a bit more complicated. The calculated
; count, and the final calculated mask, will tell you where the remainder is in
; the numerator: the remainder's LSb is just above the highest 1-bit in the mask.
; You may need to copy it out, mask and shift it before saving the quotient. You
; can also contrive to save the quotient into a different word rather than into
; the numerator, in which case you only need to shift the remainder back around.
; However, this will cost you more shifts during the calculation [you need to
; shift the numerator and the quotient separately].
;
; You can save yourself several bytes of RAM storage by calculating the final
; mask on-the-fly.
;----------------------------------------------------------------------------------

div24by16::
; Check for boundary conditions and set up the divide.
    mov		[dividend+3],0
; Look for divide by zero.
    mov		a,[divisor+1]		; See if the divisor is zero.
    or		a,[divisor]
    jnz		.divisor_not_zero
; Divide by zero.
    mov		a,0xFF			; Divide by zero? Get as close to infinity as we can.
    mov		[dividend],a
    mov		[dividend+1],a
    mov		[dividend+2],a
    ret

.divisor_not_zero:
;divide.c: 62: }
;divide.c: 63: shift_count = 1;
    mov		[shift_count],1+8			; Difference in #bits between dividend and divisor. Plus 1.
;divide.c: 64: while [[divisor & 0x8000] == 0]
    jmp		.divisor_shift_start

.divisor_shift_loop:
;divide.c: 65: {
;divide.c: 66: ++shift_count;
    inc		[shift_count]
;divide.c: 67: divisor <<= 1;
    asl		[divisor]
    rlc		[divisor+1]
.divisor_shift_start:
    tst		[divisor+1],0b10000000
    jz		.divisor_shift_loop
    jmp		.dividend_shift_start

.dividend_shift_loop:
;divide.c: 72: {
;divide.c: 73: if [--shift_count == 0]
    dec		[shift_count]
    jnz		.l10
;divide.c: 74: {
;divide.c: 75: dividend = 0;
    mov		[dividend],0
    mov		[dividend+1],0
    mov		[dividend+2],0
                ; dividend+3 is already zero.
    ret			; Render the output and return.

.l10:
;divide.c: 77: }
;divide.c: 78: dividend <<= 1;
    asl		[dividend]
    rlc		[dividend+1]
    rlc		[dividend+2]
                                        ; We'll never shift anything into dividend+3 in this loop.
.begin_continue:
.dividend_shift_start:
    tst		[dividend+2],0b10000000		; See if the high bit is set.
    jz		.dividend_shift_loop		; If not, repeat until it is.
;divide.c: 79: }
;divide.c: 80: {
;divide.c: 81: unsigned char mask_count = shift_count;
    mov		[mask_count],[shift_count]	; shift_count is guaranteed to be at least 1.
;divide.c: 82: mask = 0;
    mov		[mask],0
    mov		[mask+1],0
    mov		[mask+2],0
;divide.c: 88: while [mask_count]
.mask_shift_loop:
;divide.c: 89: {
;divide.c: 90: mask = [mask << 1] | 1;
    asl		[mask]
    rlc		[mask+1]
    rlc		[mask+2]
    or		[mask],1
;divide.c: 91: --mask_count;
    dec		[mask_count]
    jnz		.mask_shift_loop
;divide.c: 92: }
;divide.c: 93: }
;divide.c: 94: do

.looping:
;divide.c: 95: {
;divide.c: 96: if [divisor <= dividend]
    mov		a,[dividend+3]			; Dividend has a higher byte than divisor.
    jnz		.dividend_is_bigger		; If dividend high byte is non-zero, dividend is bigger by definition.
    mov		a,[divisor+1]
    sub		a,[dividend+2]
    jnz		.u925
    mov		a,[divisor]
    sub		a,[dividend+1]
    jz		.dividend_is_bigger
.u925:
    jnc		.divisor_is_bigger		; They're not the same, see which one is bigger.
.dividend_is_bigger:
;divide.c: 97: {
;divide.c: 98: dividend = [[dividend - divisor] << 1] | 1;
    mov		a,[divisor]
    sub		[dividend+1],a
    mov		a,[divisor+1]
    sbb		[dividend+2],a
    sbb		[dividend+3],0
    asl		[dividend+0]
    rlc		[dividend+1]
    rlc		[dividend+2]
    rlc		[dividend+3]
    or		[dividend+0],1
;divide.c: 99: }
    jmp		.l19

.divisor_is_bigger:
;divide.c: 100: else
;divide.c: 101: {
;divide.c: 102: dividend = [dividend << 1];
    asl		[dividend+0]
    rlc		[dividend+1]
    rlc		[dividend+2]
    rlc		[dividend+3]

.l19:
;divide.c: 103: }
;divide.c: 104: } while [--shift_count];
    dec		[shift_count]
    jnz		.looping
.u950:

.l16:
;divide.c: 105: dividend &= mask;
    mov		a,[mask]
    and		[dividend],a
    mov		a,[mask+1]
    and		[dividend+1],a
    mov		a,[mask+2]
    and		[dividend+2],a
    mov		[dividend+3],0
;divide.c: 106: }
    ret

;----------------------------------------------------------------------------------
; Divide a 32-bit number by a 16-bit number. Leaves the quotient in the dividend
; [numerator] and throws away the remainder.
;
; This works like a classic long division; it shifts the numerator and denominator
; to the top of their respective words [so they are the same magnitude], counts
; the number of shifts and uses it to calculate the number of subtracts needed as
; well as the mask that eliminates the remainder.
;
; This can easily be extended or reduced to accomodate various word sizes. There
; should be no problem dividing a smaller word by a larger, but you will have
; to virtually extend the numerator with zeros. Since the initial shift factor will
; start out negative you will need to fiddle with that; you do not want to quit
; if you generate a carry, but instead test the sign of the result. [You still will
; want to jump to .zero_quotient if the shift factor is negative. Note that a shift
; factor of zero is valid.] [This version tests for zero, not carry, so that's
; no longer true. Lowest valid shift factor is 1.]
;
; If you expect to be using long words or want to macro-ize this [or expect to
; adapt it a lot] you may want to use multi-byte words and index into them
; [e.g. [numerator+1] instead of discrete names for each byte. However, the result
; will be endian-dependent. [This one is little-endian.]
;
; If you are expecting small values in the numerator or denominator, you can
; check the high byte for 0 and do 8-bit shifts before doing the final 1-bit shifts.
; Since the 8-bit-shift loop starts with a cmp [or tst] and a jump, the cost is
; only about 13 cycles if it is not necessary. [Cypress cycles.]
;
; If you need to keep the remainder it gets a bit more complicated. The calculated
; count, and the final calculated mask, will tell you where the remainder is in
; the numerator: the remainder's LSb is just above the highest 1-bit in the mask.
; You may need to copy it out, mask and shift it before saving the quotient. You
; can also contrive to save the quotient into a different word rather than into
; the numerator, in which case you only need to shift the remainder back around.
; However, this will cost you more shifts during the calculation [you need to
; shift the numerator and the quotient separately].
;
; You can save yourself several bytes of RAM storage by calculating the final
; mask on-the-fly.
;----------------------------------------------------------------------------------

div32by16::
; Check for boundary conditions and set up the divide.
    mov		[dividend+4],0
; Look for divide by zero.
    mov		a,[divisor+1]		; See if the divisor is zero.
    or		a,[divisor]
    jnz		.divisor_not_zero
; Divide by zero.
    mov		a,0xFF			; Divide by zero? Get as close to infinity as we can.
    mov		[dividend],a
    mov		[dividend+1],a
    mov		[dividend+2],a
    mov		[dividend+3],a
    ret

.divisor_not_zero:
;divide.c: 62: }
;divide.c: 63: shift_count = 1;
    mov		[shift_count],1+16			; Difference in #bits between dividend and divisor. Plus 1.
;divide.c: 64: while [[divisor & 0x8000] == 0]
    jmp		.divisor_shift_start

.divisor_shift_loop:
;divide.c: 65: {
;divide.c: 66: ++shift_count;
    inc		[shift_count]
;divide.c: 67: divisor <<= 1;
    asl		[divisor]
    rlc		[divisor+1]
.divisor_shift_start:
    tst		[divisor+1],0b10000000
    jz		.divisor_shift_loop
    jmp		.dividend_shift_start

.dividend_shift_loop:
;divide.c: 72: {
;divide.c: 73: if [--shift_count == 0]
    dec		[shift_count]
    jnz		.l10
;divide.c: 74: {
;divide.c: 75: dividend = 0;
    mov		[dividend],0
    mov		[dividend+1],0
    mov		[dividend+2],0
    mov		[dividend+3],0
                ; dividend+4 is already zero.
    ret			; Render the output and return.

.l10:
;divide.c: 77: }
;divide.c: 78: dividend <<= 1;
    asl		[dividend]
    rlc		[dividend+1]
    rlc		[dividend+2]
    rlc		[dividend+3]
                                        ; We'll never shift anything into dividend+4 in this loop.
.begin_continue:
.dividend_shift_start:
    tst		[dividend+3],0b10000000		; See if the high bit is set.
    jz		.dividend_shift_loop		; If not, repeat until it is.
;divide.c: 79: }
;divide.c: 80: {
;divide.c: 81: unsigned char mask_count = shift_count;
    mov		[mask_count],[shift_count]	; shift_count is guaranteed to be at least 1.
;divide.c: 82: mask = 0;
    mov		[mask],0
    mov		[mask+1],0
    mov		[mask+2],0
    mov		[mask+3],0
;divide.c: 88: while [mask_count]
.mask_shift_loop:
;divide.c: 89: {
;divide.c: 90: mask = [mask << 1] | 1;
    asl		[mask]
    rlc		[mask+1]
    rlc		[mask+2]
    rlc		[mask+3]
    or		[mask],1
;divide.c: 91: --mask_count;
    dec		[mask_count]
    jnz		.mask_shift_loop
;divide.c: 92: }
;divide.c: 93: }
;divide.c: 94: do

.looping:
;divide.c: 95: {
;divide.c: 96: if [divisor <= dividend]
    mov		a,[dividend+4]			; Dividend has a higher byte than divisor.
    jnz		.dividend_is_bigger		; If dividend high byte is non-zero, dividend is bigger by definition.
    mov		a,[divisor+1]
    sub		a,[dividend+3]
    jnz		.u925
    mov		a,[divisor]
    sub		a,[dividend+2]
    jz		.dividend_is_bigger
.u925:
    jnc		.divisor_is_bigger		; They're not the same, see which one is bigger.
.dividend_is_bigger:
;divide.c: 97: {
;divide.c: 98: dividend = [[dividend - divisor] << 1] | 1;
    mov		a,[divisor]
    sub		[dividend+2],a
    mov		a,[divisor+1]
    sbb		[dividend+3],a
    sbb		[dividend+4],0
    asl		[dividend+0]
    rlc		[dividend+1]
    rlc		[dividend+2]
    rlc		[dividend+3]
    rlc		[dividend+4]
    or		[dividend+0],1
;divide.c: 99: }
    jmp		.l19

.divisor_is_bigger:
;divide.c: 100: else
;divide.c: 101: {
;divide.c: 102: dividend = [dividend << 1];
    asl		[dividend+0]
    rlc		[dividend+1]
    rlc		[dividend+2]
    rlc		[dividend+3]
    rlc		[dividend+4]

.l19:
;divide.c: 103: }
;divide.c: 104: } while [--shift_count];
    dec		[shift_count]
    jnz		.looping
.u950:

.l16:
;divide.c: 105: dividend &= mask;
    mov		a,[mask]
    and		[dividend],a
    mov		a,[mask+1]
    and		[dividend+1],a
    mov		a,[mask+2]
    and		[dividend+2],a
    mov		a,[mask+3]
    and		[dividend+3],a
    mov		[dividend+4],0
;divide.c: 106: }
    ret
;
; Copyright 2010 Jeffrey J. Nonken
; License: permission granted for unlimited use and unlimited derived use.
; If you use this, I’d appreciate it if you’d give me credit.
; Just leave these comments intact or mention me in your comments. I don’t need
; money for this, but I’d appreciate credit where it’s due.
; (Though if you want to send me money, I won’t refuse it. :)
;
; Like what I did here? Feel free to offer me a job. http://jeffrey.nonken.net/
;
; Let me know if you have any improvements. PLEASE let me know if you find bugs.

;
; This is an integer divide function, 24 bits divided by 16 bits.
; Stuff your 24-bit value into “dividend”, 16-bit value into “divisor” and call div24by16.
; Note that “dividend” is a 4-byte value, the high byte is used internally for overflow.
; Any value you place there will get squashed first thing.
; The quotient of the divide will be left in dividend, so make sure your value is safe someplace else.
; For that matter, I destroy divisor, too.
; The remainder is computed as a side-effect of the divide but we do not attempt to preserve it.
; A few lines of extra code could be written to return the remainder in “divisor”, but it will take
; some shifting around to justify the value properly. Use the inverse of “mask” on the result in
; “divisor” and the value calculated in “shift_count” to figure that one out. Don’t forget to
; include the overflow bit.
;
; I originally created this algorithm in the 1980s to work on an 8086 processor.
; Later I ported it to the Cypress M8 processor for 24 over 16. [That code is in the Times Square Ball, by the way.]
; In order to avoid some tricky stuff on the PIC 16 series processor I ported it to “C”, ran a test on a simulator,
; debugged it, then got the PICC C compiler to generate assembly code. I hand-optimized the result.
; This is what’s left.
;
; I wasn’t using the PICC C compiler to write the code for the project because it’s not well-enough integrated
; into MPLAB to do simulation or emulation. I used Microchip’s C18 compiler set to an 18F4455 MCU for the
; test/debug cycle, then used PICC C to generate the assembly.
;
; It’s not pretty, but it works.
;

AREA    bss [RAM, REL]
AREA InterruptRAM      [RAM, REL, CON]   ; Interrupts, on Page 0

quotient::                    ; Outgoing quotient is in dividend.
dividend::        BLK    5        ; Incoming dividend, outgoing quotient.
divisor::        BLK    2        ; Incoming divisor. This gets destroyed.
mask:            BLK    4        ; Mask used to separate quotient from remainder.
shift_count:    BLK    1        ; Number of shifts required to complete the operation.
;  Also number of bits in the quotient mask.
mask_count:                    ; Counter used to generate the mask.
;  [Re-use carry, we don't use both at one time.]
carry:            BLK    1        ; Holding register for the carry bit. Used in the only
;  true 3-byte operation we have.

AREA text [ROM, REL, CON]

;
; This function tries to be efficient by shifting both values to the far left to minimize the number of repititions
; through the main loop. That comes at a cost, of course, but I expect that most of the value pairs I use will do
; better this way.
;
; Dividing 1024 by 33 takes about 345 cycles, including the call/return. Just for the record.
;
; Here’s the original comment block. Note that it was written for 24×16 divide.
;———————————————————————————-
; Divide a 24-bit number by a 16-bit number. Leaves the quotient in the dividend
; [numerator] and throws away the remainder.
;
; This works like a classic long division; it shifts the numerator and denominator
; to the top of their respective words [so they are the same magnitude], counts
; the number of shifts and uses it to calculate the number of subtracts needed as
; well as the mask that eliminates the remainder.
;
; This can easily be extended or reduced to accomodate various word sizes. There
; should be no problem dividing a smaller word by a larger, but you will have
; to virtually extend the numerator with zeros. Since the initial shift factor will
; start out negative you will need to fiddle with that; you do not want to quit
; if you generate a carry, but instead test the sign of the result. [You still will
; want to jump to .zero_quotient if the shift factor is negative. Note that a shift
; factor of zero is valid.] [This version tests for zero, not carry, so that's
; no longer true. Lowest valid shift factor is 1.]
;
; If you expect to be using long words or want to macro-ize this [or expect to
; adapt it a lot] you may want to use multi-byte words and index into them
; [e.g. [numerator+1] instead of discrete names for each byte. However, the result
; will be endian-dependent. [This one is little-endian.]
;
; If you are expecting small values in the numerator or denominator, you can
; check the high byte for 0 and do 8-bit shifts before doing the final 1-bit shifts.
; Since the 8-bit-shift loop starts with a cmp [or tst] and a jump, the cost is
; only about 13 cycles if it is not necessary. [Cypress cycles.]
;
; If you need to keep the remainder it gets a bit more complicated. The calculated
; count, and the final calculated mask, will tell you where the remainder is in
; the numerator: the remainder’s LSb is just above the highest 1-bit in the mask.
; You may need to copy it out, mask and shift it before saving the quotient. You
; can also contrive to save the quotient into a different word rather than into
; the numerator, in which case you only need to shift the remainder back around.
; However, this will cost you more shifts during the calculation [you need to
; shift the numerator and the quotient separately].
;
; You can save yourself several bytes of RAM storage by calculating the final
; mask on-the-fly.
;———————————————————————————-

div24by16::
; Check for boundary conditions and set up the divide.
mov        [dividend+3],0
; Look for divide by zero.
mov        a,[divisor+1]        ; See if the divisor is zero.
or        a,[divisor]
jnz        .divisor_not_zero
; Divide by zero.
mov        a,0xFF            ; Divide by zero? Get as close to infinity as we can.
mov        [dividend],a
mov        [dividend+1],a
mov        [dividend+2],a
ret

.divisor_not_zero:
;divide.c: 62: }
;divide.c: 63: shift_count = 1;
mov        [shift_count],1+8            ; Difference in #bits between dividend and divisor. Plus 1.
;divide.c: 64: while [[divisor & 0x8000] == 0]
jmp        .divisor_shift_start

.divisor_shift_loop:
;divide.c: 65: {
;divide.c: 66: ++shift_count;
inc        [shift_count]
;divide.c: 67: divisor <<= 1;
asl        [divisor]
rlc        [divisor+1]
.divisor_shift_start:
tst        [divisor+1],0b10000000
jz        .divisor_shift_loop
jmp        .dividend_shift_start

.dividend_shift_loop:
;divide.c: 72: {
;divide.c: 73: if [--shift_count == 0]
dec        [shift_count]
jnz        .l10
;divide.c: 74: {
;divide.c: 75: dividend = 0;
mov        [dividend],0
mov        [dividend+1],0
mov        [dividend+2],0
; dividend+3 is already zero.
ret            ; Render the output and return.

.l10:
;divide.c: 77: }
;divide.c: 78: dividend <<= 1;
asl        [dividend]
rlc        [dividend+1]
rlc        [dividend+2]
; We’ll never shift anything into dividend+3 in this loop.
.begin_continue:
.dividend_shift_start:
tst        [dividend+2],0b10000000        ; See if the high bit is set.
jz        .dividend_shift_loop        ; If not, repeat until it is.
;divide.c: 79: }
;divide.c: 80: {
;divide.c: 81: unsigned char mask_count = shift_count;
mov        [mask_count],[shift_count]    ; shift_count is guaranteed to be at least 1.
;divide.c: 82: mask = 0;
mov        [mask],0
mov        [mask+1],0
mov        [mask+2],0
;divide.c: 88: while [mask_count]
.mask_shift_loop:
;divide.c: 89: {
;divide.c: 90: mask = [mask << 1] | 1;
asl        [mask]
rlc        [mask+1]
rlc        [mask+2]
or        [mask],1
;divide.c: 91: –mask_count;
dec        [mask_count]
jnz        .mask_shift_loop
;divide.c: 92: }
;divide.c: 93: }
;divide.c: 94: do

.looping:
;divide.c: 95: {
;divide.c: 96: if [divisor <= dividend]
mov        a,[dividend+3]            ; Dividend has a higher byte than divisor.
jnz        .dividend_is_bigger        ; If dividend high byte is non-zero, dividend is bigger by definition.
mov        a,[divisor+1]
sub        a,[dividend+2]
jnz        .u925
mov        a,[divisor]
sub        a,[dividend+1]
jz        .dividend_is_bigger
.u925:
jnc        .divisor_is_bigger        ; They’re not the same, see which one is bigger.
.dividend_is_bigger:
;divide.c: 97: {
;divide.c: 98: dividend = [[dividend - divisor] << 1] | 1;
mov        a,[divisor]
sub        [dividend+1],a
mov        a,[divisor+1]
sbb        [dividend+2],a
sbb        [dividend+3],0
asl        [dividend+0]
rlc        [dividend+1]
rlc        [dividend+2]
rlc        [dividend+3]
or        [dividend+0],1
;divide.c: 99: }
jmp        .l19

.divisor_is_bigger:
;divide.c: 100: else
;divide.c: 101: {
;divide.c: 102: dividend = [dividend << 1];
asl        [dividend+0]
rlc        [dividend+1]
rlc        [dividend+2]
rlc        [dividend+3]

.l19:
;divide.c: 103: }
;divide.c: 104: } while [--shift_count];
dec        [shift_count]
jnz        .looping
.u950:

.l16:
;divide.c: 105: dividend &= mask;
mov        a,[mask]
and        [dividend],a
mov        a,[mask+1]
and        [dividend+1],a
mov        a,[mask+2]
and        [dividend+2],a
mov        [dividend+3],0
;divide.c: 106: }
ret

;———————————————————————————-
; Divide a 32-bit number by a 16-bit number. Leaves the quotient in the dividend
; [numerator] and throws away the remainder.
;
; This works like a classic long division; it shifts the numerator and denominator
; to the top of their respective words [so they are the same magnitude], counts
; the number of shifts and uses it to calculate the number of subtracts needed as
; well as the mask that eliminates the remainder.
;
; This can easily be extended or reduced to accomodate various word sizes. There
; should be no problem dividing a smaller word by a larger, but you will have
; to virtually extend the numerator with zeros. Since the initial shift factor will
; start out negative you will need to fiddle with that; you do not want to quit
; if you generate a carry, but instead test the sign of the result. [You still will
; want to jump to .zero_quotient if the shift factor is negative. Note that a shift
; factor of zero is valid.] [This version tests for zero, not carry, so that's
; no longer true. Lowest valid shift factor is 1.]
;
; If you expect to be using long words or want to macro-ize this [or expect to
; adapt it a lot] you may want to use multi-byte words and index into them
; [e.g. [numerator+1] instead of discrete names for each byte. However, the result
; will be endian-dependent. [This one is little-endian.]
;
; If you are expecting small values in the numerator or denominator, you can
; check the high byte for 0 and do 8-bit shifts before doing the final 1-bit shifts.
; Since the 8-bit-shift loop starts with a cmp [or tst] and a jump, the cost is
; only about 13 cycles if it is not necessary. [Cypress cycles.]
;
; If you need to keep the remainder it gets a bit more complicated. The calculated
; count, and the final calculated mask, will tell you where the remainder is in
; the numerator: the remainder’s LSb is just above the highest 1-bit in the mask.
; You may need to copy it out, mask and shift it before saving the quotient. You
; can also contrive to save the quotient into a different word rather than into
; the numerator, in which case you only need to shift the remainder back around.
; However, this will cost you more shifts during the calculation [you need to
; shift the numerator and the quotient separately].
;
; You can save yourself several bytes of RAM storage by calculating the final
; mask on-the-fly.
;———————————————————————————-

div32by16::
; Check for boundary conditions and set up the divide.
mov        [dividend+4],0
; Look for divide by zero.
mov        a,[divisor+1]        ; See if the divisor is zero.
or        a,[divisor]
jnz        .divisor_not_zero
; Divide by zero.
mov        a,0xFF            ; Divide by zero? Get as close to infinity as we can.
mov        [dividend],a
mov        [dividend+1],a
mov        [dividend+2],a
mov        [dividend+3],a
ret

.divisor_not_zero:
;divide.c: 62: }
;divide.c: 63: shift_count = 1;
mov        [shift_count],1+16            ; Difference in #bits between dividend and divisor. Plus 1.
;divide.c: 64: while [[divisor & 0x8000] == 0]
jmp        .divisor_shift_start

.divisor_shift_loop:
;divide.c: 65: {
;divide.c: 66: ++shift_count;
inc        [shift_count]
;divide.c: 67: divisor <<= 1;
asl        [divisor]
rlc        [divisor+1]
.divisor_shift_start:
tst        [divisor+1],0b10000000
jz        .divisor_shift_loop
jmp        .dividend_shift_start

.dividend_shift_loop:
;divide.c: 72: {
;divide.c: 73: if [--shift_count == 0]
dec        [shift_count]
jnz        .l10
;divide.c: 74: {
;divide.c: 75: dividend = 0;
mov        [dividend],0
mov        [dividend+1],0
mov        [dividend+2],0
mov        [dividend+3],0
; dividend+4 is already zero.
ret            ; Render the output and return.

.l10:
;divide.c: 77: }
;divide.c: 78: dividend <<= 1;
asl        [dividend]
rlc        [dividend+1]
rlc        [dividend+2]
rlc        [dividend+3]
; We’ll never shift anything into dividend+4 in this loop.
.begin_continue:
.dividend_shift_start:
tst        [dividend+3],0b10000000        ; See if the high bit is set.
jz        .dividend_shift_loop        ; If not, repeat until it is.
;divide.c: 79: }
;divide.c: 80: {
;divide.c: 81: unsigned char mask_count = shift_count;
mov        [mask_count],[shift_count]    ; shift_count is guaranteed to be at least 1.
;divide.c: 82: mask = 0;
mov        [mask],0
mov        [mask+1],0
mov        [mask+2],0
mov        [mask+3],0
;divide.c: 88: while [mask_count]
.mask_shift_loop:
;divide.c: 89: {
;divide.c: 90: mask = [mask << 1] | 1;
asl        [mask]
rlc        [mask+1]
rlc        [mask+2]
rlc        [mask+3]
or        [mask],1
;divide.c: 91: –mask_count;
dec        [mask_count]
jnz        .mask_shift_loop
;divide.c: 92: }
;divide.c: 93: }
;divide.c: 94: do

.looping:
;divide.c: 95: {
;divide.c: 96: if [divisor <= dividend]
mov        a,[dividend+3]            ; Dividend has a higher byte than divisor.
jnz        .dividend_is_bigger        ; If dividend high byte is non-zero, dividend is bigger by definition.
mov        a,[divisor+1]
sub        a,[dividend+3]
jnz        .u925
mov        a,[divisor]
sub        a,[dividend+2]
jz        .dividend_is_bigger
.u925:
jnc        .divisor_is_bigger        ; They’re not the same, see which one is bigger.
.dividend_is_bigger:
;divide.c: 97: {
;divide.c: 98: dividend = [[dividend - divisor] << 1] | 1;
mov        a,[divisor]
sub        [dividend+2],a
mov        a,[divisor+1]
sbb        [dividend+3],a
sbb        [dividend+4],0
asl        [dividend+0]
rlc        [dividend+1]
rlc        [dividend+2]
rlc        [dividend+3]
rlc        [dividend+4]
or        [dividend+0],1
;divide.c: 99: }
jmp        .l19

.divisor_is_bigger:
;divide.c: 100: else
;divide.c: 101: {
;divide.c: 102: dividend = [dividend << 1];
asl        [dividend+0]
rlc        [dividend+1]
rlc        [dividend+2]
rlc        [dividend+3]
rlc        [dividend+4]

.l19:
;divide.c: 103: }
;divide.c: 104: } while [--shift_count];
dec        [shift_count]
jnz        .looping
.u950:

.l16:
;divide.c: 105: dividend &= mask;
mov        a,[mask]
and        [dividend],a
mov        a,[mask+1]
and        [dividend+1],a
mov        a,[mask+2]
and        [dividend+2],a
mov        a,[mask+3]
and        [dividend+3],a
mov        [dividend+4],0
;divide.c: 106: }
ret