If you’ve taken a course in numerical computing you’ll be familiar with the various sources of numerical error. First you have **representation error** that arises when you have to represent a number with a finite number of digits. A common example: the decimal expansion for 1/3 = 0.3333… has a never-ending string of 3’s and at some point you just have to cut it. The same happens with 1/10 in binary. When you start adding (or subtracting) you may fall victim to **loss of significance**: when the two numbers have drastically different magnitude you lose least significant digits in the smaller one. You can also have **cancellation**: when two numbers have almost equal magnitude but opposite signs, you lose the most significant digits and produce a large relative error. You may also suffer from **overflow** and **underflow** when magnitude of intermediate or final results gets out of the representable range.

There’s also a surprising source of error that comes from performing intermediate calculations with *higher precision* than you asked for. This happens when programming in C or C++ on the Intel architecture, where the x87 floating point unit works with 80-bit floating point numbers instead of the 64-bit double precision values we’re used to.

You are probably asking, *how can higher precision make things worse?* Let me explain, by calculating 30% of 36500 in C:

double percentage = 0.3; int val = 36500 * percentage; // val should be 10950, but the computer says 10949!

The problem is that `percentage`

is slightly less than 3/10 to begin with because of representation error. When it is converted into an 80-bit precision value for use with the x87 floating point unit the result is far from what the machine would calculate as 3/10. The result of the multiplication will be just slightly less than 10950, and the program returns **10949** as the value for `val`

. And this of course leads to fun Stackoverflow questions.

How can this be fixed? Use the `round`

function to round the result to the nearest integer. Or if you want to truncate, don’t use floating point numbers at all:

int val = 36500 * 3/10;