1. Memory & Pointers Fundamentals
Before we can understand stacks or interrupts, we need a rock-solid model of what memory actually is at the hardware level. Everything else builds on this.
Memory as an Array of Bytes
Physical memory (RAM) is conceptually a giant array of bytes. Each byte has an address — a number that identifies its position in this array. On x86-64, addresses are 64-bit numbers, though typically only the lower 48 bits are used.
A number (typically written in hexadecimal) that identifies a specific byte location in memory. On x86-64, addresses are 64 bits wide.
A value that holds an address. When we say "a pointer to X," we mean a value that contains the address where X is stored. The pointer itself is just a number.
Bytes vs Bits, and Multi-Byte Values
A byte is 8 bits. One byte can represent values 0–255 (unsigned) or -128 to +127 (signed). But most useful values need more than one byte:
| Name | Size | Range (unsigned) |
|---|---|---|
| Byte | 8 bits (1 byte) | 0 to 255 |
| Word | 16 bits (2 bytes) | 0 to 65,535 |
| Double word (dword) | 32 bits (4 bytes) | 0 to ~4 billion |
| Quad word (qword) | 64 bits (8 bytes) | 0 to ~18 quintillion |
Little-Endian Byte Order
x86-64 is little-endian: the least significant byte is stored at the lowest address.
Worked Example: Storing a 64-bit Value
Let's store the value 0x123456789ABCDEF0 at address 0x1000:
Registers vs Memory
Registers are tiny storage locations inside the CPU itself. They are not part of RAM. The CPU can access registers in a single clock cycle — hundreds of times faster than RAM.
Common Misconceptions
❌ "Registers are like variables in memory with special names."
✓ Registers are physically separate from RAM. They're built into the CPU silicon and have dedicated circuitry.
❌ "Little-endian means the bytes are stored backwards."
✓ It means the least significant byte is at the lowest address. The value itself isn't "backwards."
Sanity Check
- If I have a 32-bit value
0xDEADBEEFstored at address0x2000in little-endian, what byte is at address0x2000? At0x2003? - Can the CPU add two values stored in RAM directly, or must they first be loaded into registers?
- What's the difference between "address
0x1000" and "the value0x1000stored somewhere in memory"?
2. Code Execution Model
Now that we understand memory and registers, let's see how the CPU actually executes your code.
Where Machine Code Lives: The Text Segment
The Instruction Pointer: RIP
A 64-bit register containing the memory address of the next instruction the CPU will fetch and execute. You cannot directly write to RIP with mov — you change it with jump/call/return instructions.
The Fetch-Decode-Execute Cycle
- Fetch: Read the bytes at address RIP from memory.
- Decode: Figure out what instruction this is and what operands it uses.
- Execute: Perform the operation.
- Advance RIP: Move RIP forward by the instruction's length.
- Repeat.
CALL vs JMP: What CALL Does Extra
| Instruction | What it does |
|---|---|
jmp target | RIP ← target address. That's it. |
call target | 1. Push the return address onto the stack 2. RIP ← target address |
ret | Pop a value from the stack into RIP |
Worked Example: CALL and RET Mechanics
Sanity Check
- If RIP is
0x500000and the current instruction is 4 bytes long (with no jumps), what will RIP be after? - What's the key difference between
jmpandcall? - After
call fooexecutes, where is the return address stored?
3. Stack Basics
The stack is just a region of memory. There's nothing magical about it — it's bytes like everything else. What makes it special is how we use it.
RSP: The Stack Pointer
A register containing the address of the most recently pushed value on the stack. On x86-64, RSP points to the current top of stack (the last item pushed), not the next free slot.
Why the Stack "Grows Down"
On x86-64, the stack grows toward lower addresses. When you push, RSP decreases. When you pop, RSP increases.
PUSH and POP: The Exact Operations
Worked Example: Push and Pop with Concrete Addresses
Key Insight
"Freeing" stack memory means moving RSP past it. The bytes remain in RAM with their old values until something else overwrites them.
Common Misconceptions
❌ "The stack is a separate hardware structure."
✓ The stack is just a region of regular memory. RSP is just a register.
❌ "Pop erases data from memory."
✓ Pop copies the value and adjusts RSP. The memory still contains the old value.
Sanity Check
- If RSP is
0x1000and you executepush rax, what is RSP afterward? - After popping a value, could you theoretically read it back from memory if you knew the address?
- Why does the stack grow downward on x86-64?
4. Stack Frames
The contiguous region of stack memory allocated for one function call. It typically includes: the return address (pushed by call), saved base pointer, saved callee-saved registers, and local variables.
Function Prologue and Epilogue
System V AMD64 ABI: Argument Passing
| Argument # | Register |
|---|---|
| 1st integer/pointer | RDI |
| 2nd | RSI |
| 3rd | RDX |
| 4th | RCX |
| 5th | R8 |
| 6th | R9 |
| 7th+ | Pushed on stack |
Return values go in RAX.
Stack Frame Layout Diagram
Sanity Check
- After
push rbp; mov rbp, rsp, what is the relationship between RBP and the return address? - If a function has 3 local 64-bit variables, how many bytes does
sub rsp, Nsubtract? - In System V ABI, if I call
foo(10, 20, 30), which registers hold 10, 20, and 30?
5. Calling Conventions & Register Saving
Caller-Saved vs Callee-Saved Registers
Stack Alignment: The 16-Byte Rule
The System V ABI requires RSP to be 16-byte aligned at the point of a call instruction.
Why Alignment Matters
SSE/AVX instructions require 16-byte aligned memory operands. If the stack is misaligned, instructions like movaps will fault with a #GP (General Protection) exception.
Sanity Check
- If a function uses R12, what must it do before returning?
- Why is RDI caller-saved rather than callee-saved?
- If RSP =
0x7FFF0108before acall, is this properly aligned?
6. Interrupts & CPU Exceptions
An asynchronous event from hardware (keyboard, disk, timer) that signals the CPU to stop what it's doing and run a handler.
A synchronous event caused by the currently executing instruction: division by zero, page fault, invalid opcode, etc.
The Interrupt Descriptor Table (IDT)
What the CPU Pushes on Entry
Critical Kernel Rule
The CPU does NOT save general-purpose registers. Your interrupt handler must save any registers it uses before modifying them, and restore them before returning.
IRETQ vs RET
Common Misconceptions
❌ "The CPU saves all registers on interrupt."
✓ The CPU only saves RIP, CS, RFLAGS, and (on privilege change) RSP/SS. General-purpose registers are the handler's responsibility.
❌ "I can return from an interrupt handler with RET."
✓ You must use IRETQ. RET only pops RIP, leaving CS, RFLAGS, etc. on the stack.
Sanity Check
- What's the difference between an interrupt and an exception?
- If a page fault handler modifies RAX without saving it first, what happens when the handler returns?
- Does the CPU push an error code for all exceptions?
7. Stack Switching on Interrupts
Privilege Rings: Ring 3 vs Ring 0
| Ring | Name | Who runs here |
|---|---|---|
| Ring 0 | Kernel mode | OS kernel, drivers |
| Ring 3 | User mode | Applications |
The TSS: Task State Segment
A CPU structure containing stack pointers for privilege level transitions. The key field is RSP0 — the stack pointer to use when entering ring 0.
IST: Interrupt Stack Table
The IST provides up to 7 dedicated stacks for critical exceptions. Each IDT entry can specify an IST index (1-7). If non-zero, the CPU loads RSP from that IST entry, regardless of current privilege level.
Double Fault and IST
If an exception occurs while trying to invoke an exception handler, a Double Fault (#DF) fires. If the double fault handler also fails, the CPU triple-faults and resets. Always put your double fault handler on a dedicated IST stack!
Sanity Check
- Why can't the kernel use the user's stack for interrupt handling?
- What field in the TSS provides the kernel stack pointer?
- Why does double fault need an IST entry?
8. Context Switches
All the state required to resume a thread's execution: GPRs, RSP, RIP, RFLAGS, and (for processes) the address space (page tables).
Thread Switch vs Process Switch
| Thread Switch | Process Switch |
|---|---|
| Same address space | Different address space |
| Save/restore registers + RSP | All that + switch CR3 |
| Relatively fast | Slow (TLB flush) |
CR3 and Page Tables
A control register containing the physical address of the top-level page table (PML4 on x86-64). Changing CR3 switches the entire address space mapping and flushes the TLB.
Why Context Switches Are "Slow"
- TLB flush: After switching CR3, all cached address translations are invalid.
- Cache pollution: The new thread's working set isn't in L1/L2/L3 cache.
- Branch predictor pollution: The CPU's branch predictor has learned the old thread's patterns.
Common Misconceptions
❌ "Context switches are slow because saving registers is slow."
✓ Saving ~15 registers is fast. The slowdown is TLB flush, cache misses, and branch predictor retraining.
❌ "Threads in the same process share everything."
✓ They share address space (code, heap) but each has its own stack and register state.
Sanity Check
- What register holds the page table base address?
- Why is a process switch slower than a thread switch?
- What is the TLB and why does flushing it hurt performance?
9. Kernel-in-Rust Practical Guidance
The x86-interrupt Calling Convention
Rust's extern "x86-interrupt" is special. Unlike extern "C", it:
- Expects the interrupt stack frame to already be on the stack
- Automatically saves/restores all scratch registers
- Uses
iretqto return, notret - Handles the error code for you (if present)
use x86_64::structures::idt::InterruptStackFrame;
extern "x86-interrupt" fn page_fault_handler(
stack_frame: InterruptStackFrame,
error_code: u64,
) {
// Handle page fault...
// Compiler generates iretq, not ret
}
Common Pitfalls
Pitfall 1: Misaligned Stack
If RSP isn't 16-byte aligned when you call a function, SSE instructions will fault.
Pitfall 2: Forgetting to Save Registers
If you write a handler in pure assembly or use extern "C", you must save/restore all registers yourself.
Pitfall 3: Wrong IDT Entry Type
Interrupt gates automatically clear IF (interrupt flag). Trap gates don't. Use interrupt gates for most handlers.
Pitfall 4: No IST for Double Fault
If your double fault handler uses the regular kernel stack, a kernel stack overflow causes triple fault → reboot.
Setting Up TSS and IST in Rust
use x86_64::structures::tss::TaskStateSegment;
use x86_64::VirtAddr;
pub const DOUBLE_FAULT_IST_INDEX: u16 = 0;
lazy_static! {
static ref TSS: TaskStateSegment = {
let mut tss = TaskStateSegment::new();
tss.interrupt_stack_table[DOUBLE_FAULT_IST_INDEX as usize] = {
const STACK_SIZE: usize = 4096 * 5;
static mut STACK: [u8; STACK_SIZE] = [0; STACK_SIZE];
let stack_start = VirtAddr::from_ptr(unsafe { &STACK });
stack_start + STACK_SIZE // Stack end (grows down!)
};
tss
};
}
Sanity Check
- What does
extern "x86-interrupt"do differently fromextern "C"? - Why might a kernel use assembly stubs instead of pure
extern "x86-interrupt"? - What happens if your double fault handler doesn't use an IST entry and the kernel stack overflows?
Summary
If you can explain these 10 bullets, you understand the whole picture
- Memory is an array of bytes, each with an address. Registers are separate, fast storage inside the CPU. Little-endian means least significant byte at lowest address.
- RIP holds the address of the next instruction. The fetch-decode-execute loop runs continuously.
calldiffers fromjmpby pushing a return address. - The stack is regular memory used in a LIFO pattern. RSP points to the top.
push= decrement RSP, store.pop= load, increment RSP. - A stack frame contains return address, saved RBP, saved callee-saved registers, and local variables.
- Calling conventions (ABI) specify argument registers, return register, and which registers must be preserved. 16-byte stack alignment at
call. - Interrupts and exceptions invoke handlers via the IDT. The CPU pushes RIP, CS, RFLAGS. Handlers must save GPRs themselves.
- The TSS provides RSP0 for kernel stack on privilege transitions. The IST provides dedicated stacks for critical exceptions.
- IRETQ returns from interrupts, popping everything the CPU pushed. Using
retinstead would crash. - Context switches save/restore entire execution state. Process switches also change CR3, flushing the TLB.
- In Rust kernels,
extern "x86-interrupt"handles the weird calling convention. Use IST for double fault. Ensure stack alignment.
You now have a complete mental model. Go write that kernel! 🦀