Floating Point Arithmetic for Data Analysts

Elizabeth Byerly
Bálint Pető
2016-07-28

Press "s" for presenter's notes

Purpose

Computers don't think in real numbers

Numbers are stored as a finite set of 0s and 1s.
Most real numbers are necessarily stored innacurately.

Floating point is a standard

It allows us to reason consistently about how our programs will store numbers and perform arithmetic operations.

You will leave with:

Sufficient understanding of floating point arithmetic to avoid common errors
Jargon and references to allow for subsequent investigation if you are interested

Agenda

Why we're here: a story of floating point error
Representation and arithmetic
Sources of error
Avoiding error

We begin by explaining the QC experience that led to this presentation. It's a great example of bad assumptions, perfectly obvious to us, leading to ridiculously wrong results.
Next, we're provide a breakneck guide through how numbers are represented internally and how basic arithmetic is performed (addition and multiplication).
Once we've established how floating point works, we'll explain where floating point doesn't work (its most common failure modes).
Finally, we'll provide our current best tips for avoiding error when performing arithmetic on a computer.
At the end, if we have time, we will have sections on the IEEE standard that defines floating point (its history, what it defines) and alternative numerical representation options (rational numbers, unums).

Why we're here

Scenario

We want to convert a percentage into a categorical "bucket" variable.

..., (0.7, 0.8], (0.8, 0.9], ...

We define our cutoff points programatically:

>> x <- 0
>> for (i in 1:10) { x <- c(x, x[i] + .1) }
>> x
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

When we use those cutoffs on our continuous variable...

0.8 is bucketed into (0.8, 0.9]

> 0.7 + 0.1 == 0.8
[1] FALSE

. assert 0.7 + 0.1 == 0.8
assertion is false

> print(0.8, digits = 20)
[1] 0.80000000000000004
> print(0.7 + 0.1, digits = 20)
[1] 0.79999999999999993

It Could Happen to You

Representation and arithmetic

What is a float?

A base-2 normalized number encoded in 32 bits with a defined structure:

1 sign bit
8 exponent bits
23 mantissa bits

0100 0011 0000 0000 0000 1001 1000 0010
+1.100110000010 * 10^(1000011-1111111)

If you're confused
and afraid, so are
we. It gets easier!

Sign

0100 0011 0000 0000 0000 1001 1000 0010
+1.100110000010 * 10^(1000011-1111111)

The sign bit is set to 0 for positive numbers and 1 for negative numbers.

Exponent

0100 0011 0000 0000 0000 1001 1000 0010
+1.100110000010 * 10^(1000011-1111111)

The exponent ranges from -126 to +127 using bias: the bits encode numbers from +0 to +255 and the final value is determined by subtracting 127.

Mantissa

0100 0011 0000 0000 0000 1001 1000 0010
+1.100110000010 * 10^(1000011-1111111)

The mantissa stores up to 24 significant digits of a normalized base-2 number.

+/- Zero

*000 0000 0000 0000 0000 0000 0000 0000

+/- Infinity

*111 1111 1000 0000 0000 0000 0000 0000

NaN

1111 1111 1100 0000 0000 0000 0000 0000

How are base-10 inputs converted to floating point?

Initial base-10 representation

+2.25
-1.10

Convert to base-2

+10.01
-1.0001100110011...

Normalize with scientific notation

+1.001 * 10^1
-1.0001100110011.. * 10^0

Identify floating point elements

+1.001 * 10^1
-1.000110011... * 10^0

Account for exponent bias

+1.001 * 10^(10000000-1111111)
-1.000110011... * 10^(1111111-1111111)

Structure as floating point

0100 0000 0001 0000 0000 0000 0000
1011 1111 1000 1100 1100 1100 11...

Round

0100 0000 0001 0000 0000 0000 0000
1011 1111 1000 1100 1100 1100 1101

Resulting base-10 value

+2.25
-1.1000000238418579

How are floats added?

Decimal input

0.1 + 0.7

Floating point equation after conversion

0011 1101 1100 1100 1100 1100 1100 1101
+
0011 1111 0011 0011 0011 0011 0011 0011

Base-2

+1.10011001100110011001101 * 10^-100
+
+1.01100110011001100110011 * 10^-1

Match the exponents

+.00110011001100110011001101 * 10^-1
+
+1.01100110011001100110011 * 10^-1

Result

+1.10011001100110011001100101 * 10^-1

Identify floating point elements

+1.10011001100110011001100101 * 10^-1

Round

+1.10011001100110011001100 * 10^-1

Resulting floating point

0011 1111 0100 1100 1100 1100 1100 1100

So?

Sources of error

Also known as:
Common Stumbling Points

False precision

 print(mean(c(1.1, 2.74, 3, 4.9, 5.2)), digits = 20)
[1] 3.3880000000000003

Do you believe your model is measuring to the 16th significant digit?

Exact comparisons

. assert float(1.1) == 1.1
assertion is false

. assert 0.1 + 0.2 > 0.3

Overflow

> 10^309
[1] Inf

Underflow

> 10^-324
[1] 0

Drift

> x <- 0.7
> for (i in 1:3) {
+   print(x <- x + 0.1, digits = 22)
+ }
[1] 0.79999999999999993
[1] 0.89999999999999991
[1] 0.99999999999999989

Avoiding error

Be aware of context

When is the exact value of a variable relevant to our analysis?

Counts
Financial data
Comparisons

Rational numbers

If you need exact calculations and have access to the data before it has been converted to floating point, you can store the numerator and denominator as integers for uncorrupted fractions.

data.frame(numerator = c(1, 1, 2, 2),
           denominator = c(3, 3, 6, 6))

Avoid unnecessary operations

> print((0.7 + 0.1 + 0.1 + 0.1), digits = 22)
[1] 0.99999999999999989
> print((0.7 + 0.1 * 3), digits = 22)
[1] 1

IEEE 754

History

Originally released in 1985, updated in 2008.
Draft developed by Intel.
Originally planned decimal storage, but too complicated to implement.

Standards defined

16, 32, 64, 128, & 256 bit binary numbers
32, 64, & 128 bit decimal numbers
Subnormal numbers
Representation of +/- zero, +/- infinity, SNaN, QNaN
Minimum precision calculations (add, subtract, multiply, divide, square root)
Method of rounding to nearest integer
Comparison results for non-obvious values (NaN, +/- inifinity)

Floating Point Arithmetic for Data Analysts

Purpose

Computers don't think in real numbers

Floating point is a standard

Agenda

Why we're here

Scenario

0.8 is bucketed into (0.8, 0.9]

It Could Happen to You

Representation and arithmetic

What is a float?

If you're confused and afraid, so are we. It gets easier!

Sign

Exponent

Mantissa

How are base-10 inputs converted to floating point?

Initial base-10 representation

Convert to base-2

Normalize with scientific notation

Identify floating point elements

Account for exponent bias

Structure as floating point

Round

Resulting base-10 value

How are floats added?

Decimal input

Floating point equation after conversion

Base-2

Match the exponents

Result

Identify floating point elements

Round

Resulting floating point

So?

Sources of error

Also known as:Common Stumbling Points

False precision

Exact comparisons

Overflow

Underflow

Drift

Avoiding error

Be aware of context

Rational numbers

Avoid unnecessary operations

IEEE 754

History

Standards defined

Resources

Questions?

If you're confused
and afraid, so are
we. It gets easier!

Also known as:
Common Stumbling Points