Friday, April 14, 2017

Monetary values in computer programs...

Monetary values in computer programs...  Many programmers who first encounter the problem of representing a monetary value in a program don't immediately see much of a challenge.  At first blush, it seem simple to represent a monetary value: all you need is a number plus a way to identify the currency.  Right?

Wrong.

The currency identification really is easy.  There's a widely accepted standard (ISO 4217) that specifies a three-character code for every currency in the world.  It's used in every multi-currency application I've seen in the past 15 years or so.  For instance, United States dollars have the code “USD”.  But the amount of that currency isn't just a simple number.

The first little complication is that monetary amounts aren't always integers.  You might have $5 (a nice integer value), but you might also have $5.27.  Different currencies around the world conventionally have anything from 0 to 4 decimal places for a “usual” amount of money – but they will occasionally have even more.  For instance, the price in USD for a gram of gold is often specified with three or even four decimal places, like $40.625.  Other commodities or financial instruments may have even more decimal places.  At a job I had in the early 2000s I ran into a case where prices in USD were specified to seven decimal places (that's 1/100,000th of a cent!).  The takeaway here is that the number of decimal places needed isn't infinite, but it's also not as small as you might think.  I'd feel pretty safe with 8 decimal places for USD, or 10 for any currency...

The next complication is that monetary amounts can be quite large.  Corporations these days deal with amounts as large as 100s of billions USD, or 12 digits of integer value.  For some currencies you might need to add 4 more digits.  For government systems, you'll need even more – 100s of trillions USD today.  That's 15 digits of integer value, or up to 19 in other currencies.

For presentation to people, most applications will round huge values to the nearest thousand, million, or billion.  I haven't seen rounding to the nearest trillion USD, but I won't be surprised if it happens. :)  However, internally every financial application I've ever seen keeps track of every last digit.  For USD, that means you'd need a total of 17 significant digits to represent government-scale values.  Knowing that government will only get ever more bloated, adding a “pad” of 3 or 4 digits is probably a minimum requirement – so at least 20 or 21 significant digits, maybe a few more if you want to sleep better at night.

An aside here: I once worked on a stock trading application that was USD only, and had 10 significant digits (including cents).  That meant the biggest value it could represent was $99,999,999.99.  Surely that should have been safe for stock trades, right?  Well, it wasn't, as they found out in a spectacular fashion one day.  A trader placed a buy order for a blue-chip stock where the total value of the order was about $400 million.  A very nice order, indeed!  But a price that big couldn't be represented in the system.  Worse, the program didn't detect the overflow, and instead returned a nonsense value for the calculation (number of shares times price per share) – and that nonsense value was only a few thousand dollars!  The order was transmitted to the market correctly, and actually was executed – but the accounting for it was totally messed up.  It took several engineers several days to analyze the logs for each of the thousands of small trades that made up the big order, so that we could get the customer's books in order.  The company ended up making that trade for free to get back in the good graces of its customer.  The CEO told us to fix that problem, and pronto.  Of course he thought the fix would be trivial – just change a couple lines of code and it would all be fixed.  It was actually very far from trivial to fix that problem after the fact.  The assumption about number of significant digits turns out to have been subtly and broadly spread throughout the entire application.  We were fixing related bugs a year later...

Another complication is specific to representing fractional values.  If you've digested the challenges above, you may be saying to yourself “Floating point!  Use floating point!”  For any mainstream programming language I'm aware of, that means floating point compliant with IEEE-754.  More specifically, it means the binary32 (“single precision”) or binary64 (“double precision”) variants of IEEE-754, which are the ones implemented in most hardware floating point units and in most programming language libraries.  As the variant names imply, they are implemented in binary.  The bits to the right of the decimal point have values of 1/2, 1/4, 1/8, and so on (instead of the 1/10, 1/100, 1/1000, and so on we're used to with decimal representations).  There is a consequence to this choice of binary vs. decimal base that many programmers are not aware of, and it's a consequence of special import when representing money.  It's worth taking some time to understand.

First a brief refresher.  In any numeric base, a fraction in radix form is equivalent to a fractional form.  For instance, in decimal 0.446 is equivalent to 446/1000.  Similarly, in binary 0.11011 is equivalent to 11011/100000.  Here's the part that may come as a surprise to you: in any given base, fractional values can only be represented precisely in radix form if the denominator of the fraction is a power of the base or of factors of the base.  That's a mouthful, so here are some examples to make it clearer:
  • 382/625 base 10 is 0.6112 exactly, because 625 is 5^4, and 5 is a factor of 10.
  • 101/1000 base 2 is 0.101 exactly, because 1000 is 2^4.
  • 1/3 base 10 is 0.333..., because 3 is not 10 or a factor of 10.
  • 1/1010 base 2 is 0.0001100110011... because 1010 (10 base 10) is not 2 or a factor of 2.
That last bullet shows something that has startled many a programmer: the native floating point types in most programming languages cannot exactly represent a value like 5.10.  It's not that it's hard, or that you have to use a special trick – it simply cannot be done.  Period.  End of story.  No rounding tricks will fix it.  It is not possible.  I know this is hard for many engineers to accept, because I have watched dozens and dozens of engineers try to absorb this simple fact when I showed them.  Some of them spent hours trying to prove me wrong.  Unfortunately for them, that's like trying to prove that the earth is flat, or that celebrities make good role models.  Nope.  This is one of those damned awkward stone-cold facts.

To illustrate how much of a problem this is, consider all the numbers between 0.00 and 0.01 in increments of 0.01 (base 10).  Here's a complete list of those that can be exactly represented in binary floating point: 0.00, 0.25, 0.50, 0.75.  Yup, that's it – just four of the one hundred possible numbers.  All the rest are approximations.  And approximations ain't too good for representing monetary values.  If I have $5.42, I want $5.42, not $5.419999997!

By the way, I've been assuming in here that monetary values are all expressed in base 10.  That's not actually strictly true.  I know from my days working on foreign exchange software that there is at least one currency out there that is not.  Fortunately, though, it's a minor currency (the Mauritanian ouguiya) and
even that one is base 5, so its fractional values can be represented exactly with base 10 fractions.

If you made it this far, perhaps you'll accept that a general-purpose representation of money needs these attributes:
  1. Fractions must be represented in decimal format, not binary.
  2. At least 20 significant decimal digits including the normal number of fractional digits needed for any given currency, and more if possible.
  3. Depending on the currency, up to 10 digits to the right of the decimal point.  
Most programming languages, and certainly all the mainstream languages, do not have an intrinsic data type that can satisfy the above requirements.  However, most mainstream languages do have a library of some kind (such as BigDecimal in Java) that can meet them.  Unfortunately, often those libraries have attributes that make them undesirable for use representing monetary values.  For instance, in the case of BigDecimal there are fairly severe performance penalties, the in-memory footprint is large, there is no support for a standard compact encoding (for storage or transmission purposes), databases don't support it, and precision is inherently unlimited (which exacerbates all the preceding issues). 

I was the CTO for a company building a stock trading application where someone prior to me had made the decision to use BigDecimal to represent monetary values.  We saw lots of issues resulting from this choice, but two of them recurred enough to call them a problem pattern. 

One was the performance issue I mentioned earlier.  There were certain places in the application where we did a fair amount of arithmetic.  None of it was fancy or difficult, but even operations like comparing two values, if done enough, could occupy the CPU for a significant amount of time.  I was able to do an interesting experiment there, as the monetary value representation was nicely isolated in a class.  We profiled a production server and discovered that a little over 40% of our CPU consumption was occurring inside BigDecimal.  That was by far the single largest load on the processor.  The only way we could get more throughput from that server was to get rid of BigDecimal (or rewrite it for higher performance).

The other was a little subtler: the unlimited precision.  We were forever finding places where a multiply or divide operation created results with very large precision because the programmer forgot to properly round.  Once such a number was created, all the arithmetic operations that depended on it (and there might be millions of them) got bogged down using this large number – and then they often created even more giant precision numbers.  Those didn't just create performance problems, either – they consumed so much memory that on several occasions they actually took the server down!  These were surprisingly difficult to track down, too, because our application involved multiple inter-networked servers...

This is the first of what will be a series of posts about the challenges of representing and manipulating monetary values in computer programs.  Most of this is applicable to programs in any computer language, but some will be specific to Java.

No comments:

Post a Comment