This slow divide performance was on full display anytime I wanted to convert a 32 bit binary integer to its decimal equivalent. The canonical algorithm uses repeated division-with-remainder by ten. For instance, to convert the binary number 11001001 to its decimal equivalent, you would do this:
11001001 / 1010 = 10100 remainder 1The result is then the remainders, in decimal, in reverse order: 201. You need one division operation for each digit of the result. A 32 bit number might have as many as ten digits, and on that low-horsepower little Z-80, those divisions were really expensive. So I worked hard to come up with an alternative means of conversion that didn't require division.
10100 / 1010 = 10 remainder 0
10 / 1010 = 0 remainder 10
What I came up with has been independently invented by many people; I wasn't the first, I'm sure, and I don't know who was. The basic technique depends on a simple observation: there aren't all that many possible decimal digits whose value fits in a given length binary number. For instance, in a signed 32 bit binary number, the largest positive decimal number that can be represented is 2,147,483,647. That's 10 decimal digits, any of which can have the non-zero values [1..9], except the most significant one can only be [1..2]. That's just 83 values. If you pre-compute all those 83 values and put them in a table, then you can convert a binary number to decimal with this algorithm:
start with the binary value to be convertedThis produces the decimal digits in order, from the most significant digit to the least significant digit. Most importantly, it does so with no division required.
while remaining value is nonzero
find the largest value of a decimal digit < the remaining value
record the digit
subtract that decimal digit value from the remaining value
That was vital to getting decent performance out of that Z-80. But what about a modern processor? Well, it turns out that in relative terms, division is still the red-haired stepchild of CPU performance. On a modern Intel or ARM processor, 64 bit integer division can take over 100 CPU clock cycles. Multiply operations, by comparison, take just a handful of clock cycles, while addition and subtraction are generally just 1 or 2 clock cycles. Even when using a high-level language like Java, algorithms that avoid using division repetitively can still produce large performance gains. I've done this five or six times now, most recently to get a 6:1 performance gain when converting floating point values to XML.
One line of the above algorithm I sort of glossed over: the bit about finding the largest value of a decimal digit that is less than the remaining value. The naive way to do this would be to use a linear search, or possibly a binary search of the table of decimal digit values. This would require a number of iterations over the search algorithm, at best an average of 4 iterations for a binary search over a table big enough for 32 bit values. For 64 bit values we'd have 171 decimal digit values, and we'd need 5 iterations for that binary search.
We can do better, though, by taking advantage of the fact that there's a close relationship between the binary values and the largest fitting decimal digit. I just finished coding up a nice solution for finding the largest decimal digit that fits into a signed 64 bit integer. It works by using a table with 512 entries, about three times as many as we have possible decimal digit values. Given a 64 bit positive binary number, the Java code that computes the index into that table is:
Pretty simple code, really. It's also very fast. Best of all, with a properly constructed table the index will point to an entry that will either be the correct one (that is, the largest decimal digit that will fit into the value) or it will be too big. If it is too big, then the preceding entry in the table is guaranteed to be the correct one. Much better than the iterative searches!int lz = Long.numberOfLeadingZeros( value ); int index = ((63 - lz) << 3) + (0xf & (int)Long.rotateRight( value, 60 - lz));
With those two techniques in combination, I'm doing conversions from binary to decimal with no divisions and no iterative searching. It's very challenging to benchmark performance in Java, but my simple testing shows about 5:1 improvement in performance for random numbers in the 64 bit positive integer range. Smaller numbers (with fewer decimal digits) show smaller performance gains, but always at least 2:1. I'll call that a win!