Elizabeth Byerly

Bálint Pető

2016-07-28

Press "s" for presenter's notes

Numbers are stored as a ** finite set of 0s and 1s**.

Most real numbers are necessarily stored innacurately.

It allows us to reason consistently about how our programs will store numbers and perform arithmetic operations.

You will leave with:

- Sufficient understanding of floating point arithmetic to avoid common errors
- Jargon and references to allow for subsequent investigation if you are interested

- Why we're here: a story of floating point error
- Representation and arithmetic
- Sources of error
- Avoiding error

We want to convert a percentage into a categorical "bucket" variable.

`..., (0.7, 0.8], (0.8, 0.9], ...`

We define our cutoff points programatically:

```
>> x <- 0
>> for (i in 1:10) { x <- c(x, x[i] + .1) }
>> x
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
```

When we use those cutoffs on our continuous variable...

0.8 is bucketed into (0.8, 0.9]

```
> 0.7 + 0.1 == 0.8
[1] FALSE
```

```
. assert 0.7 + 0.1 == 0.8
assertion is false
```

```
> print(0.8, digits = 20)
[1] 0.80000000000000004
> print(0.7 + 0.1, digits = 20)
[1] 0.79999999999999993
```

It Could Happen to You

A base-2 normalized number encoded in 32 bits with a defined structure:

- 1 sign bit
- 8 exponent bits
- 23 mantissa bits

0100 0011 0000 0000 0000 1001 1000 0010

+1.100110000010 * 10^(1000011-1111111)

If you're confused

and afraid, so are

we. It gets easier!

0100 0011 0000 0000 0000 1001 1000 0010

+1.100110000010 * 10^(1000011-1111111)

The sign bit is set to 0 for positive numbers and 1 for negative numbers.

0100 0011 0000 0000 0000 1001 1000 0010

+1.100110000010 * 10^(1000011-1111111)

The exponent ranges from -126 to +127 using *bias*: the bits encode numbers from +0 to +255 and the final value is determined by subtracting 127.

0100 0011 0000 0000 0000 1001 1000 0010

+1.100110000010 * 10^(1000011-1111111)

The mantissa stores up to 24 significant digits of a normalized base-2 number.

*000 0000 0000 0000 0000 0000 0000 0000

+/- Infinity*111 1111 1000 0000 0000 0000 0000 0000

NaN1111 1111 1100 0000 0000 0000 0000 0000

+2.25

-1.10

+10.01

-1.0001100110011...

+1.001 * 10^1

-1.0001100110011.. * 10^0

+1.001 * 10^1

-1.000110011... * 10^0

+1.001 * 10^(10000000-1111111)

-1.000110011... * 10^(1111111-1111111)

0100 0000 0001 0000 0000 0000 0000

1011 1111 1000 1100 1100 1100 11...

0100 0000 0001 0000 0000 0000 0000

1011 1111 1000 1100 1100 1100 1101

+2.25

-1.1000000238418579

0.1 + 0.7

0011 1101 1100 1100 1100 1100 1100 1101

+

0011 1111 0011 0011 0011 0011 0011 0011

+1.10011001100110011001101 * 10^-100

+

+1.01100110011001100110011 * 10^-1

+.00110011001100110011001101 * 10^-1

+

+1.01100110011001100110011 * 10^-1

+1.10011001100110011001100101 * 10^-1

+1.10011001100110011001100101 * 10^-1

+1.10011001100110011001100 * 10^-1

0011 1111 0100 1100 1100 1100 1100 1100

Common Stumbling Points

```
print(mean(c(1.1, 2.74, 3, 4.9, 5.2)), digits = 20)
[1] 3.3880000000000003
```

Do you believe your model is measuring to the 16th significant digit?

```
. assert float(1.1) == 1.1
assertion is false
```

`. assert 0.1 + 0.2 > 0.3`

```
> 10^309
[1] Inf
```

```
> 10^-324
[1] 0
```

```
> x <- 0.7
> for (i in 1:3) {
+ print(x <- x + 0.1, digits = 22)
+ }
[1] 0.79999999999999993
[1] 0.89999999999999991
[1] 0.99999999999999989
```

When is the exact value of a variable relevant to our analysis?

- Counts
- Financial data
- Comparisons

If you need exact calculations and have access to the data before it has been converted to floating point, you can store the numerator and denominator as integers for uncorrupted fractions.

```
data.frame(numerator = c(1, 1, 2, 2),
denominator = c(3, 3, 6, 6))
```

```
> print((0.7 + 0.1 + 0.1 + 0.1), digits = 22)
[1] 0.99999999999999989
> print((0.7 + 0.1 * 3), digits = 22)
[1] 1
```

Originally released in 1985, updated in 2008.

Draft developed by Intel.

Originally planned decimal storage, but too complicated to implement.

- 16, 32, 64, 128, & 256 bit binary numbers
- 32, 64, & 128 bit decimal numbers
- Subnormal numbers
- Representation of +/- zero, +/- infinity, SNaN, QNaN
- Minimum precision calculations (add, subtract, multiply, divide, square root)
- Method of rounding to nearest integer
- Comparison results for non-obvious values (NaN, +/- inifinity)