IEEE Standard 754

Representing real numbers
1. Fixed point
2. Floating point
The IEEE 754 Standard
References

Representing real numbers

While the representation of integer quantities using binary numbers is straightforward, representation of real numbers and its arithmetics require special care. See Section 3.11 in [1]

Fixed point

One a approach is to a use fixed point representation, in which a certain number of bits are used to encode the integer part, and the remaining bits are used to encode the fractional part.

For instance, if our binary numbers are 8 bits wide, and the four most signigicant bits are used for the integer part and the four least significant bits for the fractional part, then the number 12.09 would be encoded as $\underbrace{1_7 1_6 0_5 0_4}_{\text{integer part}}\, \overbrace{1_3 0_2 0_1 1_0}^{\text{fractional part}}$.

Floating point

Another approach is the floating point method. It represents a number using a mantissa or significand $s$ and an exponent $e$ considering a fixed base $b$ such that the number can be expressed by $(s\times b^e)$

A part of the binary number is used to store the significand (including a sign bit) and the other part to store the exponent (also with a sign bit).

The IEEE 754 Standard

There are multiple ways of implementenig floating point numbers and arithmetics. However, since 1985 the IEEE Standard for Floating-Point Arithmetic (IEEE 754) has been available addressing many issues and providing a reliable and portable definition.

The latest version of the standard was published in 2019 [2] and the latest ISO version (identical to the IEEE Std 754) was published in 2020 [3]

References

[1]G. Donzellini, L. Oneto, D. Ponta, and D. Anguita, Introduction to Digital Systems Design. Springer International Publishing, 2018 [Online]. Available at: https://books.google.cl/books?id=va1qDwAAQBAJ
[2]“IEEE Standard for Floating-Point Arithmetic,” IEEE Std 754-2019 (Revision of IEEE 754-2008), pp. 1–84, 2019, doi: 10.1109/IEEESTD.2019.8766229.
[3]ISO Central Secretary, “Information technology– Microprocessor Systems – Floating-Point arithmetic,” International Organization for Standardization, Geneva, CH, Standard ISO/IEC TR 60559:2020, 2020 [Online]. Available at: https://www.iso.org/standard/80985.html