You often see the advice that you shouldn’t compare floating point
numbers with the ==
operator, however it is rarely well explained,
and often the alternative that is suggested to using ==
isn’t
any better (and can be much worse).
The reason that people often say that you shouldn’t use the ==
operator
is that floating point numbers don’t exactly represent the value that you
set them to.
Looking at a decimal example, representing $\frac{1}{3}$ as a decimal floating point number gives $0.333333$. The floating point standard doesn’t have a notion of recurring numbers so the last place is rounded to a set precision. There may be an instance in our code where we want to find out if three times our number is equal to $1$, i.e. we may have code that does this:
1 2 3 4 5 6 7 8 

Naïvely we may expect 3 * a
to be 1.0
, however $0.333333 \cdot 3$ is
$0.999999$ which is not equal to $1$ so the code will actually run into the else
statement. In the case of binary floating point this example will actually work
(i.e. (1.0f / 3) * 3
equals exactly 1.0
). This is because multiplying by
three causes our value to go above the next power of two, giving us more binary
places of precision than a float can represent. This extra precision is used in
the multiplication to round the answer up to 1.0. This isn’t the case for all
examples of this type, for instance in binary (1.0f / 83) * 83
does not equal
exactly 1.0
as the rounding ends up going in the other direction.
A solution to our problem is to choose a small value (often called epsilon) and say that if the difference between our values is less than epsilon then the test is passed. It’s very often the case that the results of floating point calculations are very close to the results that we expect (in fact right next to them on the floating point number line).
Choosing the Wrong Epsilon
As we only want the test to pass if we’re really close, choosing the smallest possible difference as our epsilon might lead to code that looks like this:
DON’T DO THIS IT’S WRONG!!
1 2 3 4 5 

This will work great for values between $0.5$ and $2.0$, however it is broken for all numbers greater than $2$, for instance this will work:
1 2 3 4 5 6 7 

However it won’t work for an example like this:
1 2 3 4 5 6 7 

It fails to recognize 83 * a
as close enough to 84.0. In this case there is no
number between 83 * a
and 84.0
i.e. 83 * a
is the very next number that
floating point numbers can represent, so we’d hope that our test would say
that they are close enough.
The problem here is each time the you go past a power of $2$ the gap between each number that can be represented doubles. To see this imagine that we have a floating point representation that has a precision of 6 places. The number one would have a representation $1$ and the next value representable above one will have the value $1.00001$, so the gap between one and the next value is $0.00001$. Now if we do the same with two and the next number we get $10$ and $10.0001$. A gap of $0.0001$, which is twice as big as the gap between 1 and the next nearest value (which has four zeros before the $1$, not three).
In the coded example we’ve chosen FLT_EPSILON
as our epsilon; this number is
the gap between $1$ and the next nearest value. If the floating point
numbers we are trying to compare are larger than $2$ the gap between any float
and the next nearest float is larger than our FLT_EPSILON
. This means that the
numbers we are comparing are either identical (so we may as well have used
==
) or the gap between them is larger than FLT_EPSILON
so the test will fail
(just as it would have done if we’d used
==
). Here’s a piece of code demonstrating
that this compare method is broken.
If your number is greater than 2 the above comparison function is identical to ==
So what Epsilon Should you Choose?
Due to floating point inaccuracy one of the most common problems is that you
end up calculating a value that is right next to the value that you
wanted. This means that the epsilon you want to choose is the gap between the
larger float and its next nearest value. A value very close to this can be
found easily by multiplying the larger of the two values by
FLT_EPSILON
. This leads to a floating point compare function like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 

Here’s an example of this compare function working where the original compare function failed.
Why is the incorrect compare so prevalent?
The answer to this is that ==
works a lot better than most people think, so
it usually doesn’t matter much that their compare function is almost
always doing exactly the same thing as ==
. For instance due to rounding the
first time that (1.0f / n) * n
doesn’t equal exactly $1$ is when n
is
$41$. There is a large and commonly used subset of floating point
numbers where the representation of the number is exact. Any number of the
form $\frac{a}{2^n}$ (note that $n$ can be zero) will have an exact
representation (provided $a$ is representable within the precision of the
standard being used). With 32 bit floats this includes all integers up to
$2^{24}$. If your floating point number is exactly representable you can add,
subtract and multiply them by other exact floating point numbers and the
answer will remain exact (again provided the answer doesn’t go over your
precision). It is also very common to be able to divide and have the result
remain exact. With this important section of numbers you could use ==
without
causing yourself any problems.
Basically ==
works so often that despite the wrong compare routine having
exactly the same result as using ==
(for the values likely to be used) most
of the time you’ll get exactly the result you expected, so you won’t notice.
What can still go wrong?
Floating point numbers can still end up farther apart than the epsilon that you have chosen. The easiest way to demonstrate this is with subtraction. For instance let’s work with 6 places of precision and we compare $10 \frac{1}{3}  10$ with $\frac{1}{3}$. Working through $10.3333  10 = 0.333300$ comparing this with our compare function gives $0.333333  0.3333 = 0.000033$ this value is much bigger than the epsilon we’d calculate for value ($0.000001$). Really we need to use the epsilon that we would get for comparing the number $10.3333$, as when we do the subtraction we cut off the first few digits, and can’t just invent new digits to go at the end, so it is assumed that those digits are $0$. If possible it would be better to write this compare the other way round i.e. compare $10 \frac{1}{3}$ with $\frac{1}{3} + 10$.
You may decide that you want to increase the size of your epsilon so that it catches more cases. To do this it’s a good idea to multiply the epsilon in the function by a value (this could either be a constant or a value you pass in so you can change your epsilon for different situations).
No number is close to 0, this is in general true. The size of a value depends
on your perspective of it. For instance if you’re dealing with numbers around
a million, 1 looks really close to 0. If on the other hand you’re dealing with
numbers close to one millionth 1 looks like a really enormous number. This
means that if you want to compare numbers to 0 and you think the floats your
comparing may not be exact, you need to choose a value which is you’re smallest
nonzero value i.e., in this case you need to have a static epsilon. The
epsilon for your code may be much smaller than FLT_EPSILON
or it could be
much larger, depending on the size of values that you are normally dealing
with. As the floating point standard can’t represent all numbers it does
actually have a value that is next to zero but this would normally be a very
poor choice of epsilon for anything.
There are other times when doing a comparison that you may want to choose a static epsilon, particularly if you’re wanting to behave in a certain way depending on if you’re around a number. This isn’t really the same situation as trying to determine whether you have floating point inaccuracies.
Really Horrible Effects of Using a Constant Epsilon
It’s not great that using FLT_EPSILON
as a constant epsilon reduces
the compare function to ==
for values greater than 2 (I explored
this because normally programmers are dealing with numbers of this
sort of size). It’s even worse if your code has to deal with small
numbers. The smaller your numbers get, the larger FLT_EPSILON
is
compared to these numbers. This means that if you’re doing
calculations with numbers around $10^{8}$ and smaller and you use a
static epsilon of FLT_EPSILON
all of your numbers will compare equal
to each other so any comparison you do will be entirely pointless (it
will always be true).
Conclusion
If you’re trying to solve floating point inaccuracy problems using an epsilon the epsilon has to be relative to the size of the numbers that you are comparing. If it isn’t then there is a good chance that your comparison function is pointless for most floating point numbers and dangerous to use with others.
There can be good reasons to use static epsilons, however you do need to think
carefully about what the epsilon is going to be. The size and range of the
floating point numbers you are using needs to have a strong
bearing on the epsilon you choose. It’s incredibly rare that
FLT_EPSILON
is a good choice of static floating point epsilon. It may be OK
if you’re only comparing numbers that are between $0.5$ and $2.0$ or as a
close to zero number.