## Floating Point Arithmetic for Data Analysts

Elizabeth Byerly
Bálint Pető
2016-07-28

Press "s" for presenter's notes

## Purpose

### Computers don't think in real numbers

Numbers are stored as a finite set of 0s and 1s.
Most real numbers are necessarily stored innacurately.

### Floating point is a standard

It allows us to reason consistently about how our programs will store numbers and perform arithmetic operations.

You will leave with:

• Sufficient understanding of floating point arithmetic to avoid common errors
• Jargon and references to allow for subsequent investigation if you are interested

## Agenda

1. Why we're here: a story of floating point error
2. Representation and arithmetic
3. Sources of error
4. Avoiding error

## Why we're here

### Scenario

We want to convert a percentage into a categorical "bucket" variable.

..., (0.7, 0.8], (0.8, 0.9], ...

We define our cutoff points programatically:

>> x <- 0
>> for (i in 1:10) { x <- c(x, x[i] + .1) }
>> x
 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

When we use those cutoffs on our continuous variable...

### 0.8 is bucketed into (0.8, 0.9]

> 0.7 + 0.1 == 0.8
 FALSE
. assert 0.7 + 0.1 == 0.8
assertion is false


> print(0.8, digits = 20)
 0.80000000000000004
> print(0.7 + 0.1, digits = 20)
 0.79999999999999993

# It Could Happen to You

## Representation and arithmetic

### What is a float?

A base-2 normalized number encoded in 32 bits with a defined structure:

• 1 sign bit
• 8 exponent bits
• 23 mantissa bits

0100 0011 0000 0000 0000 1001 1000 0010
+1.100110000010 * 10^(1000011-1111111)

### Sign

0100 0011 0000 0000 0000 1001 1000 0010
+1.100110000010 * 10^(1000011-1111111)

The sign bit is set to 0 for positive numbers and 1 for negative numbers.

### Exponent

0100 0011 0000 0000 0000 1001 1000 0010
+1.100110000010 * 10^(1000011-1111111)

The exponent ranges from -126 to +127 using bias: the bits encode numbers from +0 to +255 and the final value is determined by subtracting 127.

### Mantissa

0100 0011 0000 0000 0000 1001 1000 0010
+1.100110000010 * 10^(1000011-1111111)

The mantissa stores up to 24 significant digits of a normalized base-2 number.

+/- Zero

*000 0000 0000 0000 0000 0000 0000 0000

+/- Infinity

*111 1111 1000 0000 0000 0000 0000 0000

NaN

1111 1111 1100 0000 0000 0000 0000 0000

### How are base-10 inputs converted to floating point?

+2.25
-1.10

#### Convert to base-2

+10.01
-1.0001100110011...

#### Normalize with scientific notation

+1.001 * 10^1
-1.0001100110011.. * 10^0

#### Identify floating point elements

+1.001 * 10^1
-1.000110011... * 10^0

#### Account for exponent bias

+1.001 * 10^(10000000-1111111)
-1.000110011... * 10^(1111111-1111111)

#### Structure as floating point

0100 0000 0001 0000 0000 0000 0000
1011 1111 1000 1100 1100 1100 11...

#### Round

0100 0000 0001 0000 0000 0000 0000
1011 1111 1000 1100 1100 1100 1101

#### Resulting base-10 value

+2.25
-1.1000000238418579

0.1 + 0.7

#### Floating point equation after conversion

0011 1101 1100 1100 1100 1100 1100 1101
+
0011 1111 0011 0011 0011 0011 0011 0011

#### Base-2

+1.10011001100110011001101 * 10^-100
+
+1.01100110011001100110011 * 10^-1

#### Match the exponents

+.00110011001100110011001101 * 10^-1
+
+1.01100110011001100110011 * 10^-1

#### Result

+1.10011001100110011001100101 * 10^-1

#### Identify floating point elements

+1.10011001100110011001100101 * 10^-1

#### Round

+1.10011001100110011001100 * 10^-1

#### Resulting floating point

0011 1111 0100 1100 1100 1100 1100 1100

# So?

## Also known as:Common Stumbling Points

### False precision

 print(mean(c(1.1, 2.74, 3, 4.9, 5.2)), digits = 20)
 3.3880000000000003

Do you believe your model is measuring to the 16th significant digit?

### Exact comparisons

. assert float(1.1) == 1.1
assertion is false
. assert 0.1 + 0.2 > 0.3

### Overflow

> 10^309
 Inf

### Underflow

> 10^-324
 0

### Drift

> x <- 0.7
> for (i in 1:3) {
+   print(x <- x + 0.1, digits = 22)
+ }
 0.79999999999999993
 0.89999999999999991
 0.99999999999999989

## Avoiding error

### Be aware of context

When is the exact value of a variable relevant to our analysis?

• Counts
• Financial data
• Comparisons

### Rational numbers

If you need exact calculations and have access to the data before it has been converted to floating point, you can store the numerator and denominator as integers for uncorrupted fractions.

data.frame(numerator = c(1, 1, 2, 2),
denominator = c(3, 3, 6, 6))

### Avoid unnecessary operations

> print((0.7 + 0.1 + 0.1 + 0.1), digits = 22)
 0.99999999999999989
> print((0.7 + 0.1 * 3), digits = 22)
 1

## IEEE 754

### History

Originally released in 1985, updated in 2008.
Draft developed by Intel.
Originally planned decimal storage, but too complicated to implement.

### Standards defined

• 16, 32, 64, 128, & 256 bit binary numbers
• 32, 64, & 128 bit decimal numbers
• Subnormal numbers
• Representation of +/- zero, +/- infinity, SNaN, QNaN
• Minimum precision calculations (add, subtract, multiply, divide, square root)
• Method of rounding to nearest integer
• Comparison results for non-obvious values (NaN, +/- inifinity)