Machine Programming Basics
References
- Slides adapted from CMU
Outline
History of Intel processors and architectures
Assembly basics: registers, operands, move
Arithmetic and logical operations
C, assembly and machine code
Intel x86 Processors
Dominate laptop/desktop/server market
Evolutionary design
Backwards compatible up until 8086, introduced in 1978
Added more features as time goes on
Complex instruction set computer (CISC)
Many different instructions with many different formats
Difficult to match performance of Reduced Instruction Set Computers (RISC)
But, Intel has done just that in terms of speed, less so for low power
Intel x86 Evolution: Milestones
Name | Date | Transistors | MHz | Notes |
---|---|---|---|---|
8086 | 1978 | 29K | 5-10 | 16-bit |
386 | 1985 | 275K | 16-33 | 32-bit |
Pentium 4E | 2004 | 125M | 2800-3800 | 64-bit |
Core 2 | 2006 | 291M | 1060-3333 | multi-core |
Core i7 | 2008 | 731M | 1600-4400 | four cores |
x86 Clones: Advanced Micro Devices (AMD)
Historically
AMD has followed just behind Intel
A little bit slower, a lot cheaper
Then
Recruited top circuit designers from Digital Equipment Corp. and other downward trending companies
Built Opteron: tough competitor to Pentium 4
Developed x86-64, their own extension to 64 bits
Recent years
Intel leads the world in semiconductor technology
AMD has fallen behind
Intel’s 64 bit History
2001: Intel attempts radical shift from IA32 to IA64
Totally different architecture (Itanium)
Performance disappointing
2003: AMD steps in with evolutional solution
- x86-64 (now called “AMD64”)
2004: Intel Announces EM64T extension to IA32
Extended Memory 64 bit Technology
Almost identical to x86-64
All but low-end x86 processors support x86-64
- but, lots of code still runs in 32 bit mode
Definitions
Architecture: the parts of a processor design that one needs to understand for writing correct machine/assembly code
Machine code: the byte level programs that a processor executes
Assembly code: a text representation of machine code
Microarchitecture: implementation of the architecture
Example Instruction Set Architectures (ISA)
Intel: x86, IA32, Itanium, x86-64
ARM: Used in almost all mobile phones
RISC V: new open source ISA
Assembly/Machine Code View
Programmer Visible State
PC: Program counter
- Address of next instruction
Register file
Condition codes
- store status information about most recent arithmetic or logical operation
Memory
Byte addressable array
Code and user data
Stack to support procedures
Assembly Characteristics
- “Integer” data of 1, 2, 4, or 8 bytes
- data values
- addresses (untyped pointers)
Floating point data of 4, 8, or 10 bytes
SIMD vector data types of 8, 16, 32, or 64 bytes
Code: byte sequences encoding series of instructions
No aggregate types such as arrays or structures
x86-64 Integer Registers
8-byte register | bytes 0-3 | bytes 0-1 | byte 0 |
---|---|---|---|
%rax | %eax | %ax | %al |
%rcx | %ecx | %cx | %cl |
%rdx | %edx | %dx | %dl |
%rbx | %ebx | %bx | %bl |
%rsi | %esi | %si | %sil |
%rdi | %edi | %di | %dil |
%rsp | %esp | %sp | %spl |
%rbp | %ebp | %bp | %bpl |
x86-64 Integer Registers (continued)
8-byte register | bytes 0-3 | bytes 0-1 | byte 0 |
---|---|---|---|
%r8 | %r8d | %r8w | %r8b |
%r9 | %r9d | %r9w | %r9b |
%r10 | %r10d | %r10w | %r10b |
%r11 | %r11d | %r11w | %r11b |
%r12 | %r12d | %r12w | %r12b |
%r13 | %r13d | %r13w | %r13b |
%r14 | %r14d | %r14w | %r14b |
%r15 | %r15d | %r15w | %r15b |
x86-64 Integer Registers (continued)
Some assembly instructions include a suffix that indicates what portion of the register is accessed:
q: “quadword” 8 bytes
l: “double word” lower 4 bytes
w: “word” lower 2 bytes
b: “byte” lowest byte
Assembly Characteristics: Operations
Transfer data between memory and register
Load data from memory into register
Store register data into memory
Perform arithmetic function on register or memory data
Transfer control
Unconditional jumps to/from procedures
Conditional branches
Indirect branches
Moving Data
Instruction:
movq
source (Src), destination (Dest)
Operand types
Immediate (Imm): constant integer data
Register (Reg): one of 16 integer registers
Memory (Mem): 8 consecutive bytes of memory at address given by register
movq
Operand Combinations
Source | Destination | Example | C Analog |
---|---|---|---|
Imm | Reg | movq $0x4, %rax |
temp = 0x04; |
Imm | Mem | movq $-147, (%rax) |
*p = -147; |
Reg | Reg | movq %rax, %rdx |
temp2 = temp1; |
Reg | Mem | movq %rax, (%rdx) |
*p = temp; |
Mem | Reg | movq (%rax), %rdx |
temp = *p; |
Memory Addressing Modes
Immediate
$val
val: constant integer value
example:
movq $7, %rax
Normal
( R ) Mem[Reg[R]]
R: register R specifies memory address
movq (%rcx), %rax
Memory Addressing Modes (continued)
Displacement
D(R) Mem[Reg[R] + D]
R: register specifies start of memory region
D: constant displacement D specifies offset
example:
movq 8(%rdi), %rdx
Memory Addressing Modes (continued)
Indexed
D(Rb, Ri, S) Mem[Reg[Rb] + S*Reg[Ri]+D]
D: constant displacement 1, 2, or 4 bytes
Rb: base register
Ri: index register: any except %esp
S: scale: 1, 2, 4, or 8
example:
movq 0x100(%rcx, %rax, 4), %rdx
Addressing Modes Example
Example C code
void swap (long *xp, long *yp) { long t0 = *xp; long t1 = *yp; *xp = t1; *yp = t0; }
Addressing Modes Example
x86 assembly version
# %rdi = xp # %rsi = yp swap: movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *yp movq %rdx, (%rdi) # *xp = t1 movq %rax, (%rsi) # *yp = t0 ret
Address Computation Examples
rdx
contains0xf000
rcx
contains0x0100
Expression Address Computation Address 0x8 (%rdx)
0xf000 + 0x8
0xf008
(%rdx, %rcx)
0xf000 + 0x100
0xf100
(%rdx, %rcx, 4)
0xf000 + 4*0x100
0xf400
0x80(,%rdx,2)
2*0xf000 + 0x80
0x1e080
Address Computation Instruction
leaq
Src, Dest- Load effective address of source into destination
Uses
Computing addresses without a memory reference
Computing arithmetic expressions of the form
x + k * y
Example
long m12(long x) { return x*12; } leaq (%rdi, %rdi, 2), %rax # t = x+2*x salq $2, %rax
Some Arithmetic Operations
Binary operators
addq
Src, Dest Dest = Dest + Src subq
Src, Dest Dest = Dest - Src imulq
Src, Dest Dest = Dest * Src salq
Src, Dest Dest = Dest << Src sarq
Src, Dest Dest = Dest >> Src (arithmetic) shrq
Src, Dest Dest = Dest >> Src (logical) xorq
Src, Dest Dest = Dest ^ Src andq
Src, Dest Dest = Dest & Src orq
Src, Dest Dest = Dest | Src Be careful of the argument order
Some Arithmetic Operations
Unary operators
incq
Dest Dest = Dest + 1 decq
Dest Dest = Dest - 1 negq
Dest Dest = - Dest notq
Dest Dest = ~ Dest
Arithmetic Expression Example
C code
long arith (long x, long y, long z) { long t1 = x+y; long t2 = z+t1; long t3 = x+4; long t4 = y * 48; long t5 = t3 + t4; long rval = t2 + t5; return rval; }
Arithmetic Expression Example
Assembly code
# %rdi = x # %rsi = y # %rdx = z arith: leaq (%rdi, %rsi), %rax # t1 addq %rdx, %rax # t2 leaq (%rsi, %rsi, 2), %rdx salq $4, %rdx # t4 leaq 4(%rdi, %rdx), %rcx # t5 imulq %rcx, %rax # rval ret
Turning C into Object Code
Code in files
p1.c
andp2.c
Compile with command:
gcc -Og p1.c p2.c -o p
use basic optimizations (
-Og
)put resulting binary in file
p
The above
gcc
command runs the following programs:- source text \(\rightarrow\) cpp \(\rightarrow\) compiler \(\rightarrow\) assembler \(\rightarrow\) linker
Assembly
Compiling C to assembly:
gcc -Og -S <file>
- produces an assembly file
<file>.s
- produces an assembly file
Disassembling Code:
objdump -d <file>
useful tool for examing object code
analyzes bit pattern of series of instructions
produces approximate rendition of assembly code
Summary
History of Intel processors and architectures
C, assembly, machine code
new forms of visible state: program counter, registers, \(\ldots\)
Compiler must transform language constructs into low level instruction sequences
Assembly basics: registers, operands, move
Arithmetic