One problem with 8-bit micros is that their math processing is usually limited to addition and subtraction. You can do a lot with addition and subtraction, but sometimes you need to do multiplication and division. Even the mighty 8086 didn’t get its FPU built in until the 80386 (and it was optional until the Pentium, if I remember correctly) though you could get an external FPU.
Of the two processors I use most (the Microchip PIC18 and the Cypress M8) each has an 8-bit multiply function accessible via registers (though the M8 is a signed multiply — who was the genius who thought that up? It makes a mess of doing multi-word multiplies). And multiply is comparatively simple to write.
But to do a proper divide requires more work. Especially if you don’t want to just subtract the divisor from the dividend and add 1 to the quotient until you run out of dividend. It works, but it’s terribly slow.
I’m not very strong on math so I fell back on what I knew. What I knew was long division as taught to me in elementary school. You know — subtract a multiple of the divisor from the top of the dividend, record the multiplier, calculate the remainder, then do it again on the remainder. Wash, rinse, repeat until you run out of dividend or patience. (OK, not the best description, but you already know the drill.)
So that’s what I did for my bitwise long divide. I first wrote it for the 8086, later I ported it to the M8, then the PIC16, then the M8 again when I lost the first code. I’ve ported it to several different sizes of dividend and divisor; with some work I could probably make it run-time adaptable but of course that would only slow it down. I did make a macro version for the PIC16 that uses preprocessor directives (while/endw mostly) to let you generate code for any size words you like. I don’t think I can do that for the M8, their preprocessor is very primitive. The original M8 code is in the Times Square Ball firmware, by the way. What I’ll be giving you has been ported twice since then, but it’s essentially the same. (I’ll give you the PIC16 code in a separate blog post, fear not.)
Here’s the general idea. First, shift both values (dividend and divisor) left until each is at the top of its word. Keep track of how many times you shifted each (plus the difference in word size); that tells you how many bits will be in your quotient (and indirectly your remainder), though it doesn’t say how many of those bits are set to 1, don’t get me wrong. But the quotient will never be any larger than that number of bits. That number also tells you when to stop subtracting; it’s a counter.
Next you do a subtract-test-record-shift cycle exactly like the one in your by-hand long division, except that ours is in binary. You are shifting the dividend, not the divisor (which stays constant).
Now here’s the tricky bit. Every time you do a subtract-shift you discard one bit of significance. That means that a new low bit becomes available in the dividend on each cycle. So instead of putting that bit into a separate quotient (and requiring a second set of shifts — remember that we’re in a slow, 8-bit processor), you can set the result of your subtract into the low bit of the dividend after you’ve shifted it. Of course that means you can’t (easily) test for zero to see if you’re through; that’s why you need that counter. Still, leaving the divide early due to the dividend going to zero is an unusual case, I should think. (And if it is not, in your particular case, due to peculiarities of your particular data set, then you obviously want to optimize your code to take advantage of that.)
That leaves the quotient in the dividend. But what about the leftover bits after the last subtract? Well… that’s your remainder.
Now early on we determined how many bits the quotient would have. We use this to generate a mask that lets us preserve the quotient and throw away the remainder. But what if you need the remainder? That’s a bit more work but it turns out to be exactly the number of bits remaining that aren’t quotient. All you have to do is copy, mask, and shift right by — guess what? — the number of bits in the quotient, the same counter we calculated up front. (Actually you can skip the ‘mask’ step for the remainder since you’re going to shift the quotient into the bit bucket anyway.)
Cool stuff, Maynard.
Unfortunately I haven’t figured out how to attach files to this blog (except media). I can of course upload elsewhere and link to it. Meantime I’m going to try to simply paste it into the blog.
Bonus! Here I am giving you two different word sizes, 24/16 and 32/16. The differences should be obvious; it should help if you ever need to port to other word sizes. Please note however that the design doesn’t really allow for the divisor to have more bits than the dividend.
One other note in reference to my post “Optimize This!” regarding re-use of variables. Note that I do that here. First, it’s stable, test code; second, you’ll notice that I gave the variable two different, meaningful labels for the two separate uses; third, you’ll notice that I commented it.
On the other hand, I didn’t really comment the code as much as I usually do. I did however leave in the C code (as comments) I used during one porting process.
One last note: notwithstanding the number of bits we counted, when you store the results the quotient word size is the same as the dividend and the remainder is the same word size as the divisor. Again, that’s the general case and your dataset may generate smaller words, optimize for your dataset.
; ; Copyright 2010 Jeffrey J. Nonken ; License: permission granted for unlimited use and unlimited derived use. ; If you use this, I'd appreciate it if you'd give me credit. ; Just leave these comments intact or mention me in your comments. I don't need ; money for this, but I'd appreciate credit where it's due. ; (Though if you want to send me money, I won't refuse it.; ; Like what I did here? Feel free to offer me a job. http://jeffrey.nonken.net/ ; ; Let me know if you have any improvements. PLEASE let me know if you find bugs. ; ; This is an integer divide function, 24 bits divided by 16 bits. ; Stuff your 24-bit value into "dividend", 16-bit value into "divisor" and call div24by16. ; Note that "dividend" is a 4-byte value, the high byte is used internally for overflow. ; Any value you place there will get squashed first thing. ; The quotient of the divide will be left in dividend, so make sure your value is safe someplace else. ; For that matter, I destroy divisor, too. ; The remainder is computed as a side-effect of the divide but we do not attempt to preserve it. ; A few lines of extra code could be written to return the remainder in "divisor", but it will take ; some shifting around to justify the value properly. Use the inverse of "mask" on the result in ; "divisor" and the value calculated in "shift_count" to figure that one out. Don't forget to ; include the overflow bit. ; ; I originally created this algorithm in the 1980s to work on an 8086 processor. ; Later I ported it to the Cypress M8 processor for 24 over 16. [That code is in the Times Square Ball, by the way.] ; In order to avoid some tricky stuff on the PIC 16 series processor I ported it to "C", ran a test on a simulator, ; debugged it, then got the PICC C compiler to generate assembly code. I hand-optimized the result. ; This is what's left. ; ; I wasn't using the PICC C compiler to write the code for the project because it's not well-enough integrated ; into MPLAB to do simulation or emulation. I used Microchip's C18 compiler set to an 18F4455 MCU for the ; test/debug cycle, then used PICC C to generate the assembly. ; ; It's not pretty, but it works. ; AREA bss [RAM, REL] quotient:: ; Outgoing quotient is in dividend. dividend:: BLK 5 ; Incoming dividend, outgoing quotient. divisor:: BLK 2 ; Incoming divisor. This gets destroyed. mask: BLK 4 ; Mask used to separate quotient from remainder. shift_count: BLK 1 ; Number of shifts required to complete the operation. ; Also number of bits in the quotient mask. mask_count: ; Counter used to generate the mask. ; [Re-use carry, we don't use both at one time.] carry: BLK 1 ; Holding register for the carry bit. Used in the only ; true 3-byte operation we have. AREA text [ROM, REL, CON] ; ; This function tries to be efficient by shifting both values to the far left to minimize the number of repititions ; through the main loop. That comes at a cost, of course, but I expect that most of the value pairs I use will do ; better this way. ; ; Dividing 1024 by 33 takes about 345 cycles, including the call/return. Just for the record. ; ; Here's the original comment block. Note that it was written for 24x16 divide. ;---------------------------------------------------------------------------------- ; Divide a 24-bit number by a 16-bit number. Leaves the quotient in the dividend ; [numerator] and throws away the remainder. ; ; This works like a classic long division; it shifts the numerator and denominator ; to the top of their respective words [so they are the same magnitude], counts ; the number of shifts and uses it to calculate the number of subtracts needed as ; well as the mask that eliminates the remainder. ; ; This can easily be extended or reduced to accomodate various word sizes. There ; should be no problem dividing a smaller word by a larger, but you will have ; to virtually extend the numerator with zeros. Since the initial shift factor will ; start out negative you will need to fiddle with that; you do not want to quit ; if you generate a carry, but instead test the sign of the result. [You still will ; want to jump to .zero_quotient if the shift factor is negative. Note that a shift ; factor of zero is valid.] [This version tests for zero, not carry, so that's ; no longer true. Lowest valid shift factor is 1.] ; ; If you expect to be using long words or want to macro-ize this [or expect to ; adapt it a lot] you may want to use multi-byte words and index into them ; [e.g. [numerator+1] instead of discrete names for each byte. However, the result ; will be endian-dependent. [This one is little-endian.] ; ; If you are expecting small values in the numerator or denominator, you can ; check the high byte for 0 and do 8-bit shifts before doing the final 1-bit shifts. ; Since the 8-bit-shift loop starts with a cmp [or tst] and a jump, the cost is ; only about 13 cycles if it is not necessary. [Cypress cycles.] ; ; If you need to keep the remainder it gets a bit more complicated. The calculated ; count, and the final calculated mask, will tell you where the remainder is in ; the numerator: the remainder's LSb is just above the highest 1-bit in the mask. ; You may need to copy it out, mask and shift it before saving the quotient. You ; can also contrive to save the quotient into a different word rather than into ; the numerator, in which case you only need to shift the remainder back around. ; However, this will cost you more shifts during the calculation [you need to ; shift the numerator and the quotient separately]. ; ; You can save yourself several bytes of RAM storage by calculating the final ; mask on-the-fly. ;---------------------------------------------------------------------------------- div24by16:: ; Check for boundary conditions and set up the divide. mov [dividend+3],0 ; Look for divide by zero. mov a,[divisor+1] ; See if the divisor is zero. or a,[divisor] jnz .divisor_not_zero ; Divide by zero. mov a,0xFF ; Divide by zero? Get as close to infinity as we can. mov [dividend],a mov [dividend+1],a mov [dividend+2],a ret .divisor_not_zero: ;divide.c: 62: } ;divide.c: 63: shift_count = 1; mov [shift_count],1+8 ; Difference in #bits between dividend and divisor. Plus 1. ;divide.c: 64: while [[divisor & 0x8000] == 0] jmp .divisor_shift_start .divisor_shift_loop: ;divide.c: 65: { ;divide.c: 66: ++shift_count; inc [shift_count] ;divide.c: 67: divisor <<= 1; asl [divisor] rlc [divisor+1] .divisor_shift_start: tst [divisor+1],0b10000000 jz .divisor_shift_loop jmp .dividend_shift_start .dividend_shift_loop: ;divide.c: 72: { ;divide.c: 73: if [--shift_count == 0] dec [shift_count] jnz .l10 ;divide.c: 74: { ;divide.c: 75: dividend = 0; mov [dividend],0 mov [dividend+1],0 mov [dividend+2],0 ; dividend+3 is already zero. ret ; Render the output and return. .l10: ;divide.c: 77: } ;divide.c: 78: dividend <<= 1; asl [dividend] rlc [dividend+1] rlc [dividend+2] ; We'll never shift anything into dividend+3 in this loop. .begin_continue: .dividend_shift_start: tst [dividend+2],0b10000000 ; See if the high bit is set. jz .dividend_shift_loop ; If not, repeat until it is. ;divide.c: 79: } ;divide.c: 80: { ;divide.c: 81: unsigned char mask_count = shift_count; mov [mask_count],[shift_count] ; shift_count is guaranteed to be at least 1. ;divide.c: 82: mask = 0; mov [mask],0 mov [mask+1],0 mov [mask+2],0 ;divide.c: 88: while [mask_count] .mask_shift_loop: ;divide.c: 89: { ;divide.c: 90: mask = [mask << 1] | 1; asl [mask] rlc [mask+1] rlc [mask+2] or [mask],1 ;divide.c: 91: --mask_count; dec [mask_count] jnz .mask_shift_loop ;divide.c: 92: } ;divide.c: 93: } ;divide.c: 94: do .looping: ;divide.c: 95: { ;divide.c: 96: if [divisor <= dividend] mov a,[dividend+3] ; Dividend has a higher byte than divisor. jnz .dividend_is_bigger ; If dividend high byte is non-zero, dividend is bigger by definition. mov a,[divisor+1] sub a,[dividend+2] jnz .u925 mov a,[divisor] sub a,[dividend+1] jz .dividend_is_bigger .u925: jnc .divisor_is_bigger ; They're not the same, see which one is bigger. .dividend_is_bigger: ;divide.c: 97: { ;divide.c: 98: dividend = [[dividend - divisor] << 1] | 1; mov a,[divisor] sub [dividend+1],a mov a,[divisor+1] sbb [dividend+2],a sbb [dividend+3],0 asl [dividend+0] rlc [dividend+1] rlc [dividend+2] rlc [dividend+3] or [dividend+0],1 ;divide.c: 99: } jmp .l19 .divisor_is_bigger: ;divide.c: 100: else ;divide.c: 101: { ;divide.c: 102: dividend = [dividend << 1]; asl [dividend+0] rlc [dividend+1] rlc [dividend+2] rlc [dividend+3] .l19: ;divide.c: 103: } ;divide.c: 104: } while [--shift_count]; dec [shift_count] jnz .looping .u950: .l16: ;divide.c: 105: dividend &= mask; mov a,[mask] and [dividend],a mov a,[mask+1] and [dividend+1],a mov a,[mask+2] and [dividend+2],a mov [dividend+3],0 ;divide.c: 106: } ret ;---------------------------------------------------------------------------------- ; Divide a 32-bit number by a 16-bit number. Leaves the quotient in the dividend ; [numerator] and throws away the remainder. ; ; This works like a classic long division; it shifts the numerator and denominator ; to the top of their respective words [so they are the same magnitude], counts ; the number of shifts and uses it to calculate the number of subtracts needed as ; well as the mask that eliminates the remainder. ; ; This can easily be extended or reduced to accomodate various word sizes. There ; should be no problem dividing a smaller word by a larger, but you will have ; to virtually extend the numerator with zeros. Since the initial shift factor will ; start out negative you will need to fiddle with that; you do not want to quit ; if you generate a carry, but instead test the sign of the result. [You still will ; want to jump to .zero_quotient if the shift factor is negative. Note that a shift ; factor of zero is valid.] [This version tests for zero, not carry, so that's ; no longer true. Lowest valid shift factor is 1.] ; ; If you expect to be using long words or want to macro-ize this [or expect to ; adapt it a lot] you may want to use multi-byte words and index into them ; [e.g. [numerator+1] instead of discrete names for each byte. However, the result ; will be endian-dependent. [This one is little-endian.] ; ; If you are expecting small values in the numerator or denominator, you can ; check the high byte for 0 and do 8-bit shifts before doing the final 1-bit shifts. ; Since the 8-bit-shift loop starts with a cmp [or tst] and a jump, the cost is ; only about 13 cycles if it is not necessary. [Cypress cycles.] ; ; If you need to keep the remainder it gets a bit more complicated. The calculated ; count, and the final calculated mask, will tell you where the remainder is in ; the numerator: the remainder's LSb is just above the highest 1-bit in the mask. ; You may need to copy it out, mask and shift it before saving the quotient. You ; can also contrive to save the quotient into a different word rather than into ; the numerator, in which case you only need to shift the remainder back around. ; However, this will cost you more shifts during the calculation [you need to ; shift the numerator and the quotient separately]. ; ; You can save yourself several bytes of RAM storage by calculating the final ; mask on-the-fly. ;---------------------------------------------------------------------------------- div32by16:: ; Check for boundary conditions and set up the divide. mov [dividend+4],0 ; Look for divide by zero. mov a,[divisor+1] ; See if the divisor is zero. or a,[divisor] jnz .divisor_not_zero ; Divide by zero. mov a,0xFF ; Divide by zero? Get as close to infinity as we can. mov [dividend],a mov [dividend+1],a mov [dividend+2],a mov [dividend+3],a ret .divisor_not_zero: ;divide.c: 62: } ;divide.c: 63: shift_count = 1; mov [shift_count],1+16 ; Difference in #bits between dividend and divisor. Plus 1. ;divide.c: 64: while [[divisor & 0x8000] == 0] jmp .divisor_shift_start .divisor_shift_loop: ;divide.c: 65: { ;divide.c: 66: ++shift_count; inc [shift_count] ;divide.c: 67: divisor <<= 1; asl [divisor] rlc [divisor+1] .divisor_shift_start: tst [divisor+1],0b10000000 jz .divisor_shift_loop jmp .dividend_shift_start .dividend_shift_loop: ;divide.c: 72: { ;divide.c: 73: if [--shift_count == 0] dec [shift_count] jnz .l10 ;divide.c: 74: { ;divide.c: 75: dividend = 0; mov [dividend],0 mov [dividend+1],0 mov [dividend+2],0 mov [dividend+3],0 ; dividend+4 is already zero. ret ; Render the output and return. .l10: ;divide.c: 77: } ;divide.c: 78: dividend <<= 1; asl [dividend] rlc [dividend+1] rlc [dividend+2] rlc [dividend+3] ; We'll never shift anything into dividend+4 in this loop. .begin_continue: .dividend_shift_start: tst [dividend+3],0b10000000 ; See if the high bit is set. jz .dividend_shift_loop ; If not, repeat until it is. ;divide.c: 79: } ;divide.c: 80: { ;divide.c: 81: unsigned char mask_count = shift_count; mov [mask_count],[shift_count] ; shift_count is guaranteed to be at least 1. ;divide.c: 82: mask = 0; mov [mask],0 mov [mask+1],0 mov [mask+2],0 mov [mask+3],0 ;divide.c: 88: while [mask_count] .mask_shift_loop: ;divide.c: 89: { ;divide.c: 90: mask = [mask << 1] | 1; asl [mask] rlc [mask+1] rlc [mask+2] rlc [mask+3] or [mask],1 ;divide.c: 91: --mask_count; dec [mask_count] jnz .mask_shift_loop ;divide.c: 92: } ;divide.c: 93: } ;divide.c: 94: do .looping: ;divide.c: 95: { ;divide.c: 96: if [divisor <= dividend] mov a,[dividend+4] ; Dividend has a higher byte than divisor. jnz .dividend_is_bigger ; If dividend high byte is non-zero, dividend is bigger by definition. mov a,[divisor+1] sub a,[dividend+3] jnz .u925 mov a,[divisor] sub a,[dividend+2] jz .dividend_is_bigger .u925: jnc .divisor_is_bigger ; They're not the same, see which one is bigger. .dividend_is_bigger: ;divide.c: 97: { ;divide.c: 98: dividend = [[dividend - divisor] << 1] | 1; mov a,[divisor] sub [dividend+2],a mov a,[divisor+1] sbb [dividend+3],a sbb [dividend+4],0 asl [dividend+0] rlc [dividend+1] rlc [dividend+2] rlc [dividend+3] rlc [dividend+4] or [dividend+0],1 ;divide.c: 99: } jmp .l19 .divisor_is_bigger: ;divide.c: 100: else ;divide.c: 101: { ;divide.c: 102: dividend = [dividend << 1]; asl [dividend+0] rlc [dividend+1] rlc [dividend+2] rlc [dividend+3] rlc [dividend+4] .l19: ;divide.c: 103: } ;divide.c: 104: } while [--shift_count]; dec [shift_count] jnz .looping .u950: .l16: ;divide.c: 105: dividend &= mask; mov a,[mask] and [dividend],a mov a,[mask+1] and [dividend+1],a mov a,[mask+2] and [dividend+2],a mov a,[mask+3] and [dividend+3],a mov [dividend+4],0 ;divide.c: 106: } ret
; Copyright 2010 Jeffrey J. Nonken
; License: permission granted for unlimited use and unlimited derived use.
; If you use this, I’d appreciate it if you’d give me credit.
; Just leave these comments intact or mention me in your comments. I don’t need
; money for this, but I’d appreciate credit where it’s due.
; (Though if you want to send me money, I won’t refuse it.
;
; Like what I did here? Feel free to offer me a job. http://jeffrey.nonken.net/
;
; Let me know if you have any improvements. PLEASE let me know if you find bugs.
;
; This is an integer divide function, 24 bits divided by 16 bits.
; Stuff your 24-bit value into “dividend”, 16-bit value into “divisor” and call div24by16.
; Note that “dividend” is a 4-byte value, the high byte is used internally for overflow.
; Any value you place there will get squashed first thing.
; The quotient of the divide will be left in dividend, so make sure your value is safe someplace else.
; For that matter, I destroy divisor, too.
; The remainder is computed as a side-effect of the divide but we do not attempt to preserve it.
; A few lines of extra code could be written to return the remainder in “divisor”, but it will take
; some shifting around to justify the value properly. Use the inverse of “mask” on the result in
; “divisor” and the value calculated in “shift_count” to figure that one out. Don’t forget to
; include the overflow bit.
;
; I originally created this algorithm in the 1980s to work on an 8086 processor.
; Later I ported it to the Cypress M8 processor for 24 over 16. [That code is in the Times Square Ball, by the way.]
; In order to avoid some tricky stuff on the PIC 16 series processor I ported it to “C”, ran a test on a simulator,
; debugged it, then got the PICC C compiler to generate assembly code. I hand-optimized the result.
; This is what’s left.
;
; I wasn’t using the PICC C compiler to write the code for the project because it’s not well-enough integrated
; into MPLAB to do simulation or emulation. I used Microchip’s C18 compiler set to an 18F4455 MCU for the
; test/debug cycle, then used PICC C to generate the assembly.
;
; It’s not pretty, but it works.
;
AREA bss [RAM, REL]
AREA InterruptRAM [RAM, REL, CON] ; Interrupts, on Page 0
quotient:: ; Outgoing quotient is in dividend.
dividend:: BLK 5 ; Incoming dividend, outgoing quotient.
divisor:: BLK 2 ; Incoming divisor. This gets destroyed.
mask: BLK 4 ; Mask used to separate quotient from remainder.
shift_count: BLK 1 ; Number of shifts required to complete the operation.
; Also number of bits in the quotient mask.
mask_count: ; Counter used to generate the mask.
; [Re-use carry, we don't use both at one time.]
carry: BLK 1 ; Holding register for the carry bit. Used in the only
; true 3-byte operation we have.
AREA text [ROM, REL, CON]
;
; This function tries to be efficient by shifting both values to the far left to minimize the number of repititions
; through the main loop. That comes at a cost, of course, but I expect that most of the value pairs I use will do
; better this way.
;
; Dividing 1024 by 33 takes about 345 cycles, including the call/return. Just for the record.
;
; Here’s the original comment block. Note that it was written for 24×16 divide.
;———————————————————————————-
; Divide a 24-bit number by a 16-bit number. Leaves the quotient in the dividend
; [numerator] and throws away the remainder.
;
; This works like a classic long division; it shifts the numerator and denominator
; to the top of their respective words [so they are the same magnitude], counts
; the number of shifts and uses it to calculate the number of subtracts needed as
; well as the mask that eliminates the remainder.
;
; This can easily be extended or reduced to accomodate various word sizes. There
; should be no problem dividing a smaller word by a larger, but you will have
; to virtually extend the numerator with zeros. Since the initial shift factor will
; start out negative you will need to fiddle with that; you do not want to quit
; if you generate a carry, but instead test the sign of the result. [You still will
; want to jump to .zero_quotient if the shift factor is negative. Note that a shift
; factor of zero is valid.] [This version tests for zero, not carry, so that's
; no longer true. Lowest valid shift factor is 1.]
;
; If you expect to be using long words or want to macro-ize this [or expect to
; adapt it a lot] you may want to use multi-byte words and index into them
; [e.g. [numerator+1] instead of discrete names for each byte. However, the result
; will be endian-dependent. [This one is little-endian.]
;
; If you are expecting small values in the numerator or denominator, you can
; check the high byte for 0 and do 8-bit shifts before doing the final 1-bit shifts.
; Since the 8-bit-shift loop starts with a cmp [or tst] and a jump, the cost is
; only about 13 cycles if it is not necessary. [Cypress cycles.]
;
; If you need to keep the remainder it gets a bit more complicated. The calculated
; count, and the final calculated mask, will tell you where the remainder is in
; the numerator: the remainder’s LSb is just above the highest 1-bit in the mask.
; You may need to copy it out, mask and shift it before saving the quotient. You
; can also contrive to save the quotient into a different word rather than into
; the numerator, in which case you only need to shift the remainder back around.
; However, this will cost you more shifts during the calculation [you need to
; shift the numerator and the quotient separately].
;
; You can save yourself several bytes of RAM storage by calculating the final
; mask on-the-fly.
;———————————————————————————-
div24by16::
; Check for boundary conditions and set up the divide.
mov [dividend+3],0
; Look for divide by zero.
mov a,[divisor+1] ; See if the divisor is zero.
or a,[divisor]
jnz .divisor_not_zero
; Divide by zero.
mov a,0xFF ; Divide by zero? Get as close to infinity as we can.
mov [dividend],a
mov [dividend+1],a
mov [dividend+2],a
ret
.divisor_not_zero:
;divide.c: 62: }
;divide.c: 63: shift_count = 1;
mov [shift_count],1+8 ; Difference in #bits between dividend and divisor. Plus 1.
;divide.c: 64: while [[divisor & 0x8000] == 0]
jmp .divisor_shift_start
.divisor_shift_loop:
;divide.c: 65: {
;divide.c: 66: ++shift_count;
inc [shift_count]
;divide.c: 67: divisor <<= 1;
asl [divisor]
rlc [divisor+1]
.divisor_shift_start:
tst [divisor+1],0b10000000
jz .divisor_shift_loop
jmp .dividend_shift_start
.dividend_shift_loop:
;divide.c: 72: {
;divide.c: 73: if [--shift_count == 0]
dec [shift_count]
jnz .l10
;divide.c: 74: {
;divide.c: 75: dividend = 0;
mov [dividend],0
mov [dividend+1],0
mov [dividend+2],0
; dividend+3 is already zero.
ret ; Render the output and return.
.l10:
;divide.c: 77: }
;divide.c: 78: dividend <<= 1;
asl [dividend]
rlc [dividend+1]
rlc [dividend+2]
; We’ll never shift anything into dividend+3 in this loop.
.begin_continue:
.dividend_shift_start:
tst [dividend+2],0b10000000 ; See if the high bit is set.
jz .dividend_shift_loop ; If not, repeat until it is.
;divide.c: 79: }
;divide.c: 80: {
;divide.c: 81: unsigned char mask_count = shift_count;
mov [mask_count],[shift_count] ; shift_count is guaranteed to be at least 1.
;divide.c: 82: mask = 0;
mov [mask],0
mov [mask+1],0
mov [mask+2],0
;divide.c: 88: while [mask_count]
.mask_shift_loop:
;divide.c: 89: {
;divide.c: 90: mask = [mask << 1] | 1;
asl [mask]
rlc [mask+1]
rlc [mask+2]
or [mask],1
;divide.c: 91: –mask_count;
dec [mask_count]
jnz .mask_shift_loop
;divide.c: 92: }
;divide.c: 93: }
;divide.c: 94: do
.looping:
;divide.c: 95: {
;divide.c: 96: if [divisor <= dividend]
mov a,[dividend+3] ; Dividend has a higher byte than divisor.
jnz .dividend_is_bigger ; If dividend high byte is non-zero, dividend is bigger by definition.
mov a,[divisor+1]
sub a,[dividend+2]
jnz .u925
mov a,[divisor]
sub a,[dividend+1]
jz .dividend_is_bigger
.u925:
jnc .divisor_is_bigger ; They’re not the same, see which one is bigger.
.dividend_is_bigger:
;divide.c: 97: {
;divide.c: 98: dividend = [[dividend - divisor] << 1] | 1;
mov a,[divisor]
sub [dividend+1],a
mov a,[divisor+1]
sbb [dividend+2],a
sbb [dividend+3],0
asl [dividend+0]
rlc [dividend+1]
rlc [dividend+2]
rlc [dividend+3]
or [dividend+0],1
;divide.c: 99: }
jmp .l19
.divisor_is_bigger:
;divide.c: 100: else
;divide.c: 101: {
;divide.c: 102: dividend = [dividend << 1];
asl [dividend+0]
rlc [dividend+1]
rlc [dividend+2]
rlc [dividend+3]
.l19:
;divide.c: 103: }
;divide.c: 104: } while [--shift_count];
dec [shift_count]
jnz .looping
.u950:
.l16:
;divide.c: 105: dividend &= mask;
mov a,[mask]
and [dividend],a
mov a,[mask+1]
and [dividend+1],a
mov a,[mask+2]
and [dividend+2],a
mov [dividend+3],0
;divide.c: 106: }
ret
;———————————————————————————-
; Divide a 32-bit number by a 16-bit number. Leaves the quotient in the dividend
; [numerator] and throws away the remainder.
;
; This works like a classic long division; it shifts the numerator and denominator
; to the top of their respective words [so they are the same magnitude], counts
; the number of shifts and uses it to calculate the number of subtracts needed as
; well as the mask that eliminates the remainder.
;
; This can easily be extended or reduced to accomodate various word sizes. There
; should be no problem dividing a smaller word by a larger, but you will have
; to virtually extend the numerator with zeros. Since the initial shift factor will
; start out negative you will need to fiddle with that; you do not want to quit
; if you generate a carry, but instead test the sign of the result. [You still will
; want to jump to .zero_quotient if the shift factor is negative. Note that a shift
; factor of zero is valid.] [This version tests for zero, not carry, so that's
; no longer true. Lowest valid shift factor is 1.]
;
; If you expect to be using long words or want to macro-ize this [or expect to
; adapt it a lot] you may want to use multi-byte words and index into them
; [e.g. [numerator+1] instead of discrete names for each byte. However, the result
; will be endian-dependent. [This one is little-endian.]
;
; If you are expecting small values in the numerator or denominator, you can
; check the high byte for 0 and do 8-bit shifts before doing the final 1-bit shifts.
; Since the 8-bit-shift loop starts with a cmp [or tst] and a jump, the cost is
; only about 13 cycles if it is not necessary. [Cypress cycles.]
;
; If you need to keep the remainder it gets a bit more complicated. The calculated
; count, and the final calculated mask, will tell you where the remainder is in
; the numerator: the remainder’s LSb is just above the highest 1-bit in the mask.
; You may need to copy it out, mask and shift it before saving the quotient. You
; can also contrive to save the quotient into a different word rather than into
; the numerator, in which case you only need to shift the remainder back around.
; However, this will cost you more shifts during the calculation [you need to
; shift the numerator and the quotient separately].
;
; You can save yourself several bytes of RAM storage by calculating the final
; mask on-the-fly.
;———————————————————————————-
div32by16::
; Check for boundary conditions and set up the divide.
mov [dividend+4],0
; Look for divide by zero.
mov a,[divisor+1] ; See if the divisor is zero.
or a,[divisor]
jnz .divisor_not_zero
; Divide by zero.
mov a,0xFF ; Divide by zero? Get as close to infinity as we can.
mov [dividend],a
mov [dividend+1],a
mov [dividend+2],a
mov [dividend+3],a
ret
.divisor_not_zero:
;divide.c: 62: }
;divide.c: 63: shift_count = 1;
mov [shift_count],1+16 ; Difference in #bits between dividend and divisor. Plus 1.
;divide.c: 64: while [[divisor & 0x8000] == 0]
jmp .divisor_shift_start
.divisor_shift_loop:
;divide.c: 65: {
;divide.c: 66: ++shift_count;
inc [shift_count]
;divide.c: 67: divisor <<= 1;
asl [divisor]
rlc [divisor+1]
.divisor_shift_start:
tst [divisor+1],0b10000000
jz .divisor_shift_loop
jmp .dividend_shift_start
.dividend_shift_loop:
;divide.c: 72: {
;divide.c: 73: if [--shift_count == 0]
dec [shift_count]
jnz .l10
;divide.c: 74: {
;divide.c: 75: dividend = 0;
mov [dividend],0
mov [dividend+1],0
mov [dividend+2],0
mov [dividend+3],0
; dividend+4 is already zero.
ret ; Render the output and return.
.l10:
;divide.c: 77: }
;divide.c: 78: dividend <<= 1;
asl [dividend]
rlc [dividend+1]
rlc [dividend+2]
rlc [dividend+3]
; We’ll never shift anything into dividend+4 in this loop.
.begin_continue:
.dividend_shift_start:
tst [dividend+3],0b10000000 ; See if the high bit is set.
jz .dividend_shift_loop ; If not, repeat until it is.
;divide.c: 79: }
;divide.c: 80: {
;divide.c: 81: unsigned char mask_count = shift_count;
mov [mask_count],[shift_count] ; shift_count is guaranteed to be at least 1.
;divide.c: 82: mask = 0;
mov [mask],0
mov [mask+1],0
mov [mask+2],0
mov [mask+3],0
;divide.c: 88: while [mask_count]
.mask_shift_loop:
;divide.c: 89: {
;divide.c: 90: mask = [mask << 1] | 1;
asl [mask]
rlc [mask+1]
rlc [mask+2]
rlc [mask+3]
or [mask],1
;divide.c: 91: –mask_count;
dec [mask_count]
jnz .mask_shift_loop
;divide.c: 92: }
;divide.c: 93: }
;divide.c: 94: do
.looping:
;divide.c: 95: {
;divide.c: 96: if [divisor <= dividend]
mov a,[dividend+3] ; Dividend has a higher byte than divisor.
jnz .dividend_is_bigger ; If dividend high byte is non-zero, dividend is bigger by definition.
mov a,[divisor+1]
sub a,[dividend+3]
jnz .u925
mov a,[divisor]
sub a,[dividend+2]
jz .dividend_is_bigger
.u925:
jnc .divisor_is_bigger ; They’re not the same, see which one is bigger.
.dividend_is_bigger:
;divide.c: 97: {
;divide.c: 98: dividend = [[dividend - divisor] << 1] | 1;
mov a,[divisor]
sub [dividend+2],a
mov a,[divisor+1]
sbb [dividend+3],a
sbb [dividend+4],0
asl [dividend+0]
rlc [dividend+1]
rlc [dividend+2]
rlc [dividend+3]
rlc [dividend+4]
or [dividend+0],1
;divide.c: 99: }
jmp .l19
.divisor_is_bigger:
;divide.c: 100: else
;divide.c: 101: {
;divide.c: 102: dividend = [dividend << 1];
asl [dividend+0]
rlc [dividend+1]
rlc [dividend+2]
rlc [dividend+3]
rlc [dividend+4]
.l19:
;divide.c: 103: }
;divide.c: 104: } while [--shift_count];
dec [shift_count]
jnz .looping
.u950:
.l16:
;divide.c: 105: dividend &= mask;
mov a,[mask]
and [dividend],a
mov a,[mask+1]
and [dividend+1],a
mov a,[mask+2]
and [dividend+2],a
mov a,[mask+3]
and [dividend+3],a
mov [dividend+4],0
;divide.c: 106: }
ret