History of Intel processors and architectures
Assembly basics: registers, operands, move
Arithmetic and logical operations
C, assembly and machine code
Dominate laptop/desktop/server market
Evolutionary design
Backwards compatible up until 8086, introduced in 1978
Added more features as time goes on
Complex instruction set computer (CISC)
Many different instructions with many different formats
Difficult to match performance of Reduced Instruction Set Computers (RISC)
But, Intel has done just that in terms of speed, less so for low power
Name | Date | Transistors | MHz | Notes |
---|---|---|---|---|
8086 | 1978 | 29K | 5-10 | 16-bit |
386 | 1985 | 275K | 16-33 | 32-bit |
Pentium 4E | 2004 | 125M | 2800-3800 | 64-bit |
Core 2 | 2006 | 291M | 1060-3333 | multi-core |
Core i7 | 2008 | 731M | 1600-4400 | four cores |
Historically
AMD has followed just behind Intel
A little bit slower, a lot cheaper
Then
Recruited top circuit designers from Digital Equipment Corp. and other downward trending companies
Built Opteron: tough competitor to Pentium 4
Developed x86-64, their own extension to 64 bits
Recent years
Intel leads the world in semiconductor technology
AMD has fallen behind
2001: Intel attempts radical shift from IA32 to IA64
Totally different architecture (Itanium)
Performance disappointing
2003: AMD steps in with evolutional solution
2004: Intel Announces EM64T extension to IA32
Extended Memory 64 bit Technology
Almost identical to x86-64
All but low-end x86 processors support x86-64
Architecture: the parts of a processor design that one needs to understand for writing correct machine/assembly code
Machine code: the byte level programs that a processor executes
Assembly code: a text representation of machine code
Microarchitecture: implementation of the architecture
Example Instruction Set Architectures (ISA)
Intel: x86, IA32, Itanium, x86-64
ARM: Used in almost all mobile phones
RISC V: new open source ISA
Programmer Visible State
PC: Program counter
Register file
Condition codes
Memory
Byte addressable array
Code and user data
Stack to support procedures
Floating point data of 4, 8, or 10 bytes
SIMD vector data types of 8, 16, 32, or 64 bytes
Code: byte sequences encoding series of instructions
No aggregate types such as arrays or structures
8-byte register | bytes 0-3 | bytes 0-1 | byte 0 |
---|---|---|---|
%rax | %eax | %ax | %al |
%rcx | %ecx | %cx | %cl |
%rdx | %edx | %dx | %dl |
%rbx | %ebx | %bx | %bl |
%rsi | %esi | %si | %sil |
%rdi | %edi | %di | %dil |
%rsp | %esp | %sp | %spl |
%rbp | %ebp | %bp | %bpl |
8-byte register | bytes 0-3 | bytes 0-1 | byte 0 |
---|---|---|---|
%r8 | %r8d | %r8w | %r8b |
%r9 | %r9d | %r9w | %r9b |
%r10 | %r10d | %r10w | %r10b |
%r11 | %r11d | %r11w | %r11b |
%r12 | %r12d | %r12w | %r12b |
%r13 | %r13d | %r13w | %r13b |
%r14 | %r14d | %r14w | %r14b |
%r15 | %r15d | %r15w | %r15b |
Some assembly instructions include a suffix that indicates what portion of the register is accessed:
q: “quadword” 8 bytes
l: “double word” lower 4 bytes
w: “word” lower 2 bytes
b: “byte” lowest byte
Transfer data between memory and register
Load data from memory into register
Store register data into memory
Perform arithmetic function on register or memory data
Transfer control
Unconditional jumps to/from procedures
Conditional branches
Indirect branches
Instruction:
movq
source (Src), destination (Dest)Operand types
Immediate (Imm): constant integer data
Register (Reg): one of 16 integer registers
Memory (Mem): 8 consecutive bytes of memory at address given by register
movq
Operand CombinationsSource | Destination | Example | C Analog |
---|---|---|---|
Imm | Reg | movq $0x4, %rax |
temp = 0x04; |
Imm | Mem | movq $-147, (%rax) |
*p = -147; |
Reg | Reg | movq %rax, %rdx |
temp2 = temp1; |
Reg | Mem | movq %rax, (%rdx) |
*p = temp; |
Mem | Reg | movq (%rax), %rdx |
temp = *p; |
Immediate
$val
val: constant integer value
example: movq $7, %rax
Normal
( R ) Mem[Reg[R]]
R: register R specifies memory address
movq (%rcx), %rax
Displacement
D(R) Mem[Reg[R] + D]
R: register specifies start of memory region
D: constant displacement D specifies offset
example: movq 8(%rdi), %rdx
Indexed
D(Rb, Ri, S) Mem[Reg[Rb] + S*Reg[Ri]+D]
D: constant displacement 1, 2, or 4 bytes
Rb: base register
Ri: index register: any except %esp
S: scale: 1, 2, 4, or 8
example: movq 0x100(%rcx, %rax, 4), %rdx
Example C code
void swap (long *xp, long *yp) {
long t0 = *xp;
long t1 = *yp;
*xp = t1;
*yp = t0;
}
x86 assembly version
# %rdi = xp
# %rsi = yp
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
rdx
contains 0xf000
rcx
contains 0x0100
Expression | Address Computation | Address |
---|---|---|
0x8 (%rdx) |
0xf000 + 0x8 |
0xf008 |
(%rdx, %rcx) |
0xf000 + 0x100 |
0xf100 |
(%rdx, %rcx, 4) |
0xf000 + 4*0x100 |
0xf400 |
0x80(,%rdx,2) |
2*0xf000 + 0x80 |
0x1e080 |
leaq
Src, Dest
Uses
Computing addresses without a memory reference
Computing arithmetic expressions of the form x + k * y
Example
long m12(long x) {
return x*12;
}
leaq (%rdi, %rdi, 2), %rax # t = x+2*x
salq $2, %rax
Binary operators
addq |
Src, Dest | Dest = Dest + Src |
subq |
Src, Dest | Dest = Dest - Src |
imulq |
Src, Dest | Dest = Dest * Src |
salq |
Src, Dest | Dest = Dest << Src |
sarq |
Src, Dest | Dest = Dest >> Src (arithmetic) |
shrq |
Src, Dest | Dest = Dest >> Src (logical) |
xorq |
Src, Dest | Dest = Dest ^ Src |
andq |
Src, Dest | Dest = Dest & Src |
orq |
Src, Dest | Dest = Dest | Src |
Be careful of the argument order
Unary operators
incq |
Dest | Dest = Dest + 1 |
decq |
Dest | Dest = Dest - 1 |
negq |
Dest | Dest = - Dest |
notq |
Dest | Dest = ~ Dest |
C code
long arith (long x, long y, long z) {
long t1 = x+y;
long t2 = z+t1;
long t3 = x+4;
long t4 = y * 48;
long t5 = t3 + t4;
long rval = t2 + t5;
return rval;
}
Assembly code
# %rdi = x
# %rsi = y
# %rdx = z
arith:
leaq (%rdi, %rsi), %rax # t1
addq %rdx, %rax # t2
leaq (%rsi, %rsi, 2), %rdx
salq $4, %rdx # t4
leaq 4(%rdi, %rdx), %rcx # t5
imulq %rcx, %rax # rval
ret
Code in files p1.c
and p2.c
Compile with command: gcc -Og p1.c p2.c -o p
use basic optimizations (-Og
)
put resulting binary in file p
The above gcc
command runs the following programs:
Compiling C to assembly: gcc -Og -S <file>
<file>.s
Disassembling Code: objdump -d <file>
useful tool for examing object code
analyzes bit pattern of series of instructions
produces approximate rendition of assembly code
History of Intel processors and architectures
C, assembly, machine code
new forms of visible state: program counter, registers, \(\ldots\)
Compiler must transform language constructs into low level instruction sequences
Assembly basics: registers, operands, move
Arithmetic