Bits, Bytes and Integers

References

Slides adapted from CMU

Outline

Representing information as bits
Bit-level manipulations
Integers
- Representation: unsigned and signed
- Conversion, casting
- Expanding, truncating
- Addition, negation, multiplication, shifting
- Summary
Representations in memory, pointers, strings

Everything is bits

Each bit is 0 or 1
By encoding/interpreting sets of bits in various ways
- Computers determine what to do (instructions)
- … and represent and manipulate numbers, sets, strings, etc.
Why bits? Electronic implementation
- Easy to store with bitstable elements
- Reliably transmitted on noisy and inaccurate wires

Example: Counting in Binary

Base 2 number representation
- Represent \(15213_{10}\) as \(11101101101101_{2}\)
- Represent \(1.20_{10}\) as \(1.0011001100110011[0011] \ldots_{2}\)
- Represent \(1.5213 \times 10^4\) as \(1.1101101101101_{2} \times 2^{13}\)

Encoding Byte Values

Byte = 8 bits
- Binary: \(00000000_{2}\) to \(11111111_{2}\)
- Decimal: \(0_{10}\) to \(255_{10}\)
- Hexadecimal: \(00_{16}\) to \(FF_{16}\)
  - Base 16 number representation
  - Use characters ‘0’ to ‘9’ and ‘A’ to ‘F’
  - Typically written in most programming languages with the prefix 0x

Encoding Byte Values

Hex	Decimal	Binary
0	0	0000
1	1	0001
2	2	0010
3	3	0011
4	4	0100
5	5	0101
6	6	0110
7	7	0111

Encoding Byte Values

Hex	Decimal	Binary
8	8	1000
9	9	1001
A	10	1010
B	11	1011
C	12	1100
D	13	1101
E	14	1110
F	15	1111

Example Data Representations

C Data	Typical 32-bit	Typical-64	x86-64
`char`	1	1	1
`short`	2	2	2
`int`	4	4	4
`long`	4	8	8
`float`	4	4	4
`double`	8	8	8
pointer	4	8	8

Boolean Algebra

Algebraic representation of logic
- Encode “true” as 1 and “false” as 0
- Developed by George Boole in the 19th Century
Operations
- and (&): a & b = 1 when both a = 1 and b = 1
- or (|): a | b = 1 when either a = 1 and b = 1
- not (~): ~a = 1 when a = 0
- xor (^): a ^ b = 1 when either a = 1 or b = 1, but not both

General Boolean Algebras

Operate on Bit Vectors
- operations applied bitwise
Example: \[\begin{align*} & 01101001\\ \texttt{&} \; & 01010101\\ \hline & 01000001\\ \end{align*}\]
All of the properties of Boolean algebra apply

Example: Representing and Manipulating Sets

Representation
- Width \(w\) bit vector represents subsets of \(\{0, \ldots, w-1\}\)
- \(a_j = 1\) if \(j \in A\)
Operations
- &: intersection
- |: union
- ^: symmetric difference
- ~: complement

Example: Representing and Manipulating Sets

Examples with \(w = 8\)
- \(x = 01101001 = \{0, 3, 5, 6\}\)
- \(y = 01010101 = \{0, 2, 4, 6\}\)
- \(x \; \texttt{&} \; y = 01000001 = \{0, 6\}\)
- \(x \; \texttt{|} \; y = 01111101 = \{0, 2, 3, 4, 5, 6\}\)

Bit-Level Operations in C

The operations &, |, ~, and ^ are available in C
- apply to any “integral” data type: long, int, short, char, unsigned
- arguments are viewed as bit vectors
- arguments are applied bitwise
Examples with char type
- ~0x41 \(\rightarrow\) 0xBE
- ~0x00 \(\rightarrow\) 0xFF
- 0x69 & 0x55 \(\rightarrow\) 0x41

Contrast: Logical Operations in C

The logical operations in C are &&, ||, and !
- zero is viewed as “false”
- any non-zero value is viewed as “true”
- always return 0 or 1
- short-circuit evaluation
Examples with char data type
- !0x41 \(\rightarrow\) 0x00
- !0x00 \(\rightarrow\) 0x01
- 0x42 && 0x55 \(\rightarrow\) 0x01

Shift Operations

Left shift: x << y
- shift bit vector x left y positions
- fill with zeros on the right
Right shift: x >> y
- shift bit vector x right y positions
- logical shift: fill with zeros on the left
- arithmetic shift: replicate most significant bit on the left
Undefined behavior: shift amount less than zero or greater than bit vector length

Shift Examples

x = 01100010
- x << 3 = 00010000
- logical: x >> 2 = 00011000
- arithmetic: x >> 2 = 00011000
x = 10100010
- x << 3 = 00010000
- logical: x >> 2 = 00101000
- arithmetic: x >> 2 = 11101000

Encoding Integers

Unsigned

\[B2U(x) = \sum_{i=0}^{w-1} x_i \cdot 2^i\]

where \(x\) is the bit vector and \(w\) is the length of the bit vector
Signed: two’s complement

\[B2T(x) = -x_{w-1} \cdot 2^{w-1} \sum_{i=0}^{w-2} x_i \cdot 2^i\]

where \(x\) is the bit vector, \(w\) is the length of the bit vector, and \(-x_{x-1}\) is the sign bit

Example 3 Bit Integer Encodings

value	unsigned	two’s complement
`000`	`(0+0+0) = 0`	`(0+0+0) = 0`
`001`	`(0+0+1) = 1`	`(0+0+1) = 1`
`010`	`(0+2+0) = 2`	`(0+2+0) = 2`
`011`	`(0+2+1) = 3`	`(0+2+1) = 3`
`100`	`(4+0+0) = 4`	`(-4+0+0) = -4`
`101`	`(4+0+1) = 5`	`(-4+0+1) = -3`
`110`	`(4+2+0) = 6`	`(-4+2+0) = -2`
`111`	`(4+2+1) = 7`	`(-4+2+1) = -1`

Numeric Ranges

Unsigned values
- min = 0
- max = \(2^{w} - 1\)
Two’s complement values
- min = \(-2^{w-1}\)
- max = \(2^{w-1} - 1\)

Example Numeric Ranges

Values where \(w = 16\)

	decimal	hex	binary
unsigned max	65535	FF FF	11111111 11111111
signed max	32767	7F FF	01111111 11111111
signed min	-32768	80 00	10000000 00000000
-1	-1	FF FF	11111111 11111111
0	0	00 00	00000000 00000000

Unsigned and Signed Numeric Values

Equivalence
- Same encodings for non-negative values
Uniqueness
- Every bit pattern represents a unique integer value
- Each representable integer has a unique bit encoding
Can invert mappings
- unsigned bit pattern = \(U2B(x) = B2U^{-1}(x)\)
- two’s complement bit pattern = \(T2B(x) = B2T^{-1}(x)\)

Mapping Between Signed and Unsigned

Mappings between unsigned and two’s complement numbers: keep the bit representation and reinterpret.
Two’s complement to unsigned: \(T2B \circ B2U\)
Unsigned to two’s complement: \(U2B \circ B2T\)

Signed to Unsigned

Unsigned to Signed

Signed vs. Unsigned in C

Constants
- By default are considered to be signed integers
- Unsigned if the suffix is “U”, for example 42U
Casting
- Explicit casting between signed and unsigned same as \(U2T\) and \(T2U\)
- Implicit casting also occurs via assignments and procedure calls

Casting Surprises

Expression evaluation
- If there is a mix of unsigned and signed integers in a single expression, then signed values are implicilty cast to unsigned values.
- Including comparison operations: <, >, ==, <=, >=
Examples

Operand 1 Operand 2 Relation Evaluation

0 0U == unsigned

-1 0 < signed

-1 0U > unsigned

-1 -2 > signed

Operand 1	Operand 2	Relation	Evaluation
0	0U	==	unsigned
-1	0	<	signed
-1	0U	>	unsigned
-1	-2	>	signed

Unsigned vs. Signed in C

Easy to make mistakes

Example 1

unsigned i;
for (i = cnt-2; i >= 0; i--)
    a[i] += a[i+1]

Example 2

#define DELTA sizeof(int)
int i;
for (i = CNT; i-DELTA >= 0; i -= DELTA)
    ...

Summary: Casting Rules

Bit pattern is maintained, but reinterpreted
Can have unexpected effects: adding or subtracting \(2^w\)
An expression containing signed and unsigned ints implicitly casts the signed ints to unsigned ints

Sign Extension

Task
- Given \(w\)-bit signed integer \(x\)
- Convert it to \(w+k\) bit integer \(x'\) with the same value
Rule
- Make \(k\) copies of the sign bit:
- \(x' = x_{w-1}, \ldots, x_{w-1}, x_{w-1}, x_{w-2}, \ldots, x_{0}\)
C automatically performs sign extension

Sign Extension Example

Example of sign extensions from \(w=3\) to \(w=4\)

Sign Extension

Truncation

Task:
- Given \(k+w\)-bit signed or unsigned integer \(x\)
- Convert it to \(w\)-bit integer \(x'\) with the same value for “small enough” \(x\)
Rule:
- Drop top \(k\) bits:
- \(x' = x_{w-1}, x_{w-2}, \ldots, x_0\)

Summary: Expanding and Truncating Rules

Expanding (e.g. short to int)
- Unsigned: zeros added
- Signed: sign extension
- Both yield expected result
Truncating (e.g. int to short)
- Unsigned/signed: bits are truncated
- Result is reinterpreted
- Unsigned: modulus operation
- Signed: similar to modulus
- For small (in magnitude) numbers yields expected behavior

Unsigned Addition

\(UAdd_{w}(u, v)\)
- Operands: \(w\) bits
- True sum: \(w+1\) bits
- Discard carry: \(w\) bits
Standard addition function ignores carry output
Implements modular arithmetic

\[s = UAdd_w(u, v) = u + v \; \texttt{mod} \; 2^w\]

\(UAdd\) Overflow

Implements modular arithmetic \(s = UAdd_w(u, v) = u + v \; \texttt{mod} \; 2^w\)

Unsigned Integer Overflow

Visualizing Mathematical Integer Addition

\(Add_4(u, v)\)

Integer Addition

Visualizing Unsigned Integer Addition

\(UAdd_4(u, v)\)

Unsigned Integer Addition

Two’s Complement Addition

\(TAdd_{w}(u, v)\)
- Operands: \(w\) bits
- True sum: \(w+1\) bits
- Discard carry: \(w\) bits
\(TAdd\) and \(Uadd\) have identical bit level behavior

\(TAdd\) Overflow

True add requires \(w+1\) bits; drop off the most significant bit and interpret as 2’s complement integer

Signed Integer Overflow

Visualizing Two’s Complement Addition

\(TAdd_{4}(u, v)\)

Signed Integer Addition

Integer Multiplication

Problem: the exact product of \(w\)-bit numbers \(u, v\) might have a result that exceeds \(w\) bits.
- Unsigned: up to \(2w\) bits
- Two’s complement min (negative): up to \(2w-1\) bits
- Two’s complement max (positive): up to \(2w\) bits
Maintaining exact results
- would need to keep expanding word size with each product computed
- is done in software if needed

Unsigned Multiplication in C

\(UMul_{w}(u, v)\)
- Operands: \(w\) bits
- True product: \(2w\) bits
- Discard \(w\) bits: \(w\) bits
Implements modular arithmetic

\[s = UMul_w(u, v) = u + v \; \texttt{mod} \; 2^w\]

Signed Multiplication in C

\(TMul_{w}(u, v)\)
- Operands: \(w\) bits
- True product: \(2w\) bits
- Discard \(w\) bits: \(w\) bits
Ignores high order \(w\) bits, some of which are different for signed vs. unsigned multiplication

Power-of-2 Multiply with Shift

Operation u << k
- Gives \(u \cdot 2^k\) for both signed and unsigned
- Operands: \(w\) bits
- True product \(w+k\) bits
- Discard \(k\) bits: \(w\) bits

Unsigned Power-of-2 Divide with Shift

Operation u >> k
- Gives
  
  \[\bigg\lfloor \frac{u}{2^k} \bigg\rfloor\]
- Uses logical shift

Signed Power-of-2 Divide with Shift

Operation u >> k
- Gives
  
  \[\bigg\lfloor \frac{u}{2^k} \bigg\rfloor\]
- Uses arithmetic shift
- Rounds wrong direction when \(u < 0\)

Correct Signed Power-of-2 Divide with Shift

Quotient of negative number power of 2
- Want
  
  \[\bigg\lceil \frac{u}{2^k} \bigg\rceil\]
- Compute as
  
  \[\bigg\lfloor \frac{u+2^k-1}{2^k} \bigg\rfloor\]
  - In C: (u + (1<<k) - 1) >> k
  - Biases dividend toward 0

Negation: Complement and Increment

Negate through complement and increment
```
~x + 1 = -x
```
Examples

Value x ~x ~x+1 Result

15213 3B6D C492 C493 -15213

0 0000 FFFF 0000 0

TMin 8000 7FFF 8000 TMin

Value	`x`	`~x`	`~x+1`	Result
15213	`3B6D`	`C492`	`C493`	-15213
0	`0000`	`FFFF`	`0000`	0
TMin	`8000`	`7FFF`	`8000`	TMin

Arithmetic: Basic Rules

Addition
- Unsigned/signed: normal addition followed by truncate
- Unsigned: addition mod \(2^w\)
- Signed: modified addition mod \(2^w\) (result in proper range)
Multiplication
- Unsigned/signed: normal multiplication followed by truncate
- Unsigned: multiplication mod \(2^w\)
- Signed: modified multiplication mod \(2^w\) (result in proper range)

Byte-Oriented Memory Organization

Programs refer to data by address
- Conceptually envision it as a very large array of bytes
- An address is like an index into that array, and a pointer variable stores an address
Note: system provides private address space to each “process”
- Think of a process as a program being executed
- So, a program can clobber its own data, but not that of others

Machine Words

Any given computer has a “word size”
- Nominal size of integer-valued data
Until recently, most machines used 32 bits (4 bytes) as a word size
Increasingly, machines have 64 bit word size
Machines still support multiple data formats
- Fractions or multiples of word size
- Always integral number of bytes

Word-Oriented Memory Organization

Addresses specify byte locations
- Address of first byte in word
- Addresses of successive words differ by 4 (32 bit) or 8 (64 bit)

Byte Ordering

How are the bytes within a multi-byte word ordered in memory?
Conventions
- Big endian: least significant byte has highest address
- Little endian: least significant byte has lowest address
Example: 4-byte value of 0x1234567
- Big endian: 01 23 45 67
- Little endian: 67 45 23 01

Examining Data Representations

Code to print byte representation of data

typedef unsigned char *pointer;
void show_bytes(pointer start, size_t len) {
  size_t i;
  for (i = 0; i < len; i++) {
      printf("%p\t0x%.2x\n", start+i, start[i]);
  }
  printf("\n");
}

Representing Strings

Strings in C
- Represented by an array of characters
- Each character is encoded in ASCII format
- Strings should be null terminated (final character = 0)
Compatibility
- Byte ordering is not an issue

Reading Byte-Reversed Listings

Disassembly
- Text representation of binary machine code
- Generated by program that reads the machine code

Example Fragment

Address     Instruction code        Assembly Rendition
8048365:    5b                      pop
8048366:    81 c3 ab 12 00 00       add     $0x12ab,%ebx
804836c:    83 bb 28 00 00 00 00    cmpl    $0x0,0x28(%ebx)

Summary

Representing information as bits
Bit-level manipulations
Integers
- Representation: unsigned and signed
- Conversion, casting
- Expanding, truncating
- Addition, negation, multiplication, shifting
- Summary
Representations in memory, pointers, strings