Floating Point

References

Outline

Fractional Binary Numbers

Fractional Binary Number Examples

Value Representation
23/4 101.11 = 4 + 1 + 1/2 + 1/4
23/8 10.111 = 2 + 1/2 + 1/4 + 1/8
23/16 1.0111 = 1 + 1/4 + 1/8 + 1/16

Representable Numbers

IEEE Floating Point

Floating Point Representation

Precision options

Floating Point Numbers

Normalized Values

Normalized Encoding Example

Denormalized Values

Special Values

C float Decoding Example 1

C float Decoding Example 2

Tiny Floating Point Example

Dynamic Range (\(s = 0\))

s exp frac E value
0 0000 000 -6 0
closest to zero 0 0000 001 -6 1/512
largest denorm 0 0000 111 -6 7/512
smallest norm 0 0001 000 -6 8/512
closest to 1 below 0 0110 111 -1 15/16
0 0111 000 0 1
closest to 1 above 0 0111 001 0 9/8
largest norm 0 1110 111 7 240
0 1111 000 - inf

Special Properties of the IEEE Encoding

Floating Point Operations: Basic Idea

Rounding

Closer Look at Round-To-Even

Rounding Binary Numbers

Rounding

Rounding Example

Floating Point Multiplication

Floating Point Addition

Properties of Floating Point Addition

Properties of Floating Point Multiplication

Floating Point in C

Summary