Floating Point Arithmetic for Data Analysts


Elizabeth Byerly
Bálint Pető
2016-07-28

Press "s" for presenter's notes

Purpose

Computers don't think in real numbers

Numbers are stored as a finite set of 0s and 1s.
Most real numbers are necessarily stored innacurately.

Floating point is a standard

It allows us to reason consistently about how our programs will store numbers and perform arithmetic operations.

You will leave with:

  • Sufficient understanding of floating point arithmetic to avoid common errors
  • Jargon and references to allow for subsequent investigation if you are interested

Agenda

  1. Why we're here: a story of floating point error
  2. Representation and arithmetic
  3. Sources of error
  4. Avoiding error

Why we're here

Scenario

We want to convert a percentage into a categorical "bucket" variable.

..., (0.7, 0.8], (0.8, 0.9], ...

We define our cutoff points programatically:

>> x <- 0
>> for (i in 1:10) { x <- c(x, x[i] + .1) }
>> x
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

When we use those cutoffs on our continuous variable...







0.8 is bucketed into (0.8, 0.9]

> 0.7 + 0.1 == 0.8
[1] FALSE
. assert 0.7 + 0.1 == 0.8
assertion is false








> print(0.8, digits = 20)
[1] 0.80000000000000004
> print(0.7 + 0.1, digits = 20)
[1] 0.79999999999999993



It Could Happen to You

Representation and arithmetic

What is a float?

A base-2 normalized number encoded in 32 bits with a defined structure:

  • 1 sign bit
  • 8 exponent bits
  • 23 mantissa bits

0100 0011 0000 0000 0000 1001 1000 0010
+1.100110000010 * 10^(1000011-1111111)







If you're confused
and afraid, so are
we. It gets easier!

Sign

0100 0011 0000 0000 0000 1001 1000 0010
+1.100110000010 * 10^(1000011-1111111)

The sign bit is set to 0 for positive numbers and 1 for negative numbers.

Exponent

0100 0011 0000 0000 0000 1001 1000 0010
+1.100110000010 * 10^(1000011-1111111)

The exponent ranges from -126 to +127 using bias: the bits encode numbers from +0 to +255 and the final value is determined by subtracting 127.

Mantissa

0100 0011 0000 0000 0000 1001 1000 0010
+1.100110000010 * 10^(1000011-1111111)

The mantissa stores up to 24 significant digits of a normalized base-2 number.

+/- Zero

*000 0000 0000 0000 0000 0000 0000 0000

+/- Infinity

*111 1111 1000 0000 0000 0000 0000 0000

NaN

1111 1111 1100 0000 0000 0000 0000 0000

How are base-10 inputs converted to floating point?

Initial base-10 representation

+2.25
-1.10

Convert to base-2

+10.01
-1.0001100110011...

Normalize with scientific notation

+1.001 * 10^1
-1.0001100110011.. * 10^0

Identify floating point elements

+1.001 * 10^1
-1.000110011... * 10^0

Account for exponent bias

+1.001 * 10^(10000000-1111111)
-1.000110011... * 10^(1111111-1111111)

Structure as floating point

0100 0000 0001 0000 0000 0000 0000
1011 1111 1000 1100 1100 1100 11...

Round

0100 0000 0001 0000 0000 0000 0000
1011 1111 1000 1100 1100 1100 1101

Resulting base-10 value

+2.25
-1.1000000238418579

How are floats added?

Decimal input

0.1 + 0.7

Floating point equation after conversion

0011 1101 1100 1100 1100 1100 1100 1101
+
0011 1111 0011 0011 0011 0011 0011 0011

Base-2

+1.10011001100110011001101 * 10^-100
+
+1.01100110011001100110011 * 10^-1

Match the exponents

+.00110011001100110011001101 * 10^-1
+
+1.01100110011001100110011 * 10^-1

Result

+1.10011001100110011001100101 * 10^-1

Identify floating point elements

+1.10011001100110011001100101 * 10^-1

Round

+1.10011001100110011001100 * 10^-1

Resulting floating point

0011 1111 0100 1100 1100 1100 1100 1100

So?

Sources of error

Also known as:
Common Stumbling Points




False precision

 print(mean(c(1.1, 2.74, 3, 4.9, 5.2)), digits = 20)
[1] 3.3880000000000003

Do you believe your model is measuring to the 16th significant digit?

Exact comparisons

. assert float(1.1) == 1.1
assertion is false
. assert 0.1 + 0.2 > 0.3

Overflow

> 10^309
[1] Inf

Underflow

> 10^-324
[1] 0

Drift

> x <- 0.7
> for (i in 1:3) {
+   print(x <- x + 0.1, digits = 22)
+ }
[1] 0.79999999999999993
[1] 0.89999999999999991
[1] 0.99999999999999989

Avoiding error

Be aware of context

When is the exact value of a variable relevant to our analysis?

  • Counts
  • Financial data
  • Comparisons

Rational numbers

If you need exact calculations and have access to the data before it has been converted to floating point, you can store the numerator and denominator as integers for uncorrupted fractions.

data.frame(numerator = c(1, 1, 2, 2),
           denominator = c(3, 3, 6, 6))

Avoid unnecessary operations

> print((0.7 + 0.1 + 0.1 + 0.1), digits = 22)
[1] 0.99999999999999989
> print((0.7 + 0.1 * 3), digits = 22)
[1] 1

IEEE 754

History

Originally released in 1985, updated in 2008.
Draft developed by Intel.
Originally planned decimal storage, but too complicated to implement.

Standards defined

  • 16, 32, 64, 128, & 256 bit binary numbers
  • 32, 64, & 128 bit decimal numbers
  • Subnormal numbers
  • Representation of +/- zero, +/- infinity, SNaN, QNaN
  • Minimum precision calculations (add, subtract, multiply, divide, square root)
  • Method of rounding to nearest integer
  • Comparison results for non-obvious values (NaN, +/- inifinity)

Resources

Questions?