Traps and System Calls

Three distinct events force a CPU to suspend ordinary instruction execution and transfer control to specialized handler code: system calls initiated by the ecall instruction, exceptions triggered by illegal operations (such as division by zero or invalid virtual addresses), and device interrupts signaling hardware needs. These events, collectively referred to as traps, must be handled transparently so the interrupted code can resume without disruption. Complete isolation is maintained by handling all traps exclusively in kernel space. The trap handling lifecycle consists of four stages: hardware actions by the RISC-V CPU, assembly instructions to save state, a C function to determine the trap’s cause, and the specific service routine.

To process these events without breaking the execution state, the architecture relies on a specialized set of hardware control registers.

RISC-V Trap Machinery

The RISC-V hardware dictates trap behavior through supervisor-mode control registers, which are inaccessible to user mode:

stvec: Stores the memory address of the kernel’s trap handler.
sepc: Captures the program counter at the exact moment the trap occurs. The sret instruction later copies this value back to the program counter to resume execution.
scause: Stores a numeric code indicating the reason for the trap.
sscratch: Provides temporary storage crucial for the very first instructions of the trap handler.
sstatus: Contains the SIE bit, which controls whether device interrupts are deferred, and the SPP bit, which records whether the trap originated in user or supervisor mode.

When forcing a trap (excluding timer interrupts), the hardware executes a strict sequence of operations:

Aborts the trap if it is a device interrupt and the SIE bit is clear.
Disables further interrupts by clearing SIE.
Copies the current program counter to sepc.
Saves the current execution mode into the SPP bit.
Writes the trap cause into scause.
Elevates the execution mode to supervisor mode.
Copies the handler address from stvec to the program counter.
Resumes execution at the new instruction address.

The CPU intentionally minimizes its hardware operations; it does not switch page tables, switch to a kernel stack, or save general-purpose registers. This minimal hardware intervention preserves flexibility and prevents security vulnerabilities, such as a malicious application directing the kernel entry point.

Because the hardware does not perform context switching automatically, software assembly routines are required to safely bridge the gap between user execution and the kernel environment.

Traps from User Space

When a trap occurs in user space, the active page table is still the user page table, meaning the stvec trap handler address must have a valid mapping in that user space. The system satisfies this constraint using a trampoline page, mapped at the virtual address TRAMPOLINE (at the very top of the address space) in both the user and kernel page tables. This page contains the uservec assembly handler and has PTE_U permissions in the user page table, allowing the CPU to execute it immediately upon entering supervisor mode.

The user space trap sequence flows through four primary stages:

Assembly Entry (uservec):
- Because all 32 general-purpose registers belong to the interrupted user code, uservec starts by executing csrrw to swap a0 with sscratch.
- a0 now holds a pointer to the process’s trapframe (mapped directly below TRAMPOLINE), freeing a0 for use.
- uservec saves all 32 user registers into the trapframe memory structure.
- It extracts the kernel stack pointer, hartid, usertrap function address, and kernel page table address from the trapframe.
- It updates satp to the kernel page table and jumps to the usertrap C function.
C Handler (usertrap):
- Updates stvec to point to kernelvec, ensuring that any traps occurring during kernel execution are routed correctly.
- Saves sepc into the trapframe to prevent it from being overwritten if the process yields the CPU.
- Identifies the trap cause and routes it: invokes syscall for system calls, devintr for device interrupts, or kills the process for illegal exceptions.
- If handling a system call, it increments the saved sepc by 4, ensuring the process resumes at the instruction immediately following the ecall.
C Return Preparation (usertrapret):
- Prepares the control registers for a future user trap by pointing stvec back to uservec.
- Populates the trapframe fields required by uservec and restores sepc to the saved user program counter.
- Calls userret on the trampoline page, passing the TRAPFRAME address and the user page table pointer.
Assembly Exit (userret):
- Switches satp back to the user page table.
- Restores the 32 user registers from the trapframe, performs a final swap of a0 and sscratch to restore the user’s a0, and executes sret to re-enter user mode.

The most common deliberate trap from user space is a system call, which utilizes the trapframe infrastructure to pass instructions and data securely to the kernel.

System Call Mechanisms and Arguments

User programs initiate system calls by placing arguments into specific registers (e.g., a0, a1), loading the system call number into a7, and executing ecall. Once the trap mechanism hands control to the syscall function, the kernel uses the saved a7 value to index into the syscalls array, which acts as a dispatch table mapping numbers to implementation functions.

Upon completion, the system call’s return value is written to p->trapframe->a0, overwriting the first argument so the user code receives the result. By convention, negative numbers indicate errors, while zero or positive numbers indicate success.

System calls must frequently access arguments and memory provided by the user process:

The functions argint, argaddr, and argfd extract integers, pointers, and file descriptors from the saved registers in the trapframe.
Pointers passed from user space are inherently untrusted and point to user virtual addresses, which do not correspond to kernel memory mappings.
The kernel uses functions like fetchstr and copyinstr to safely read string data from user space.
These functions rely on walkaddr to manually translate user virtual addresses to physical addresses by walking the user page table, strictly checking permissions to prevent unauthorized access to kernel memory.

While the complex trampoline mechanism safely handles transitions from user space, traps that occur while already executing inside the kernel require a much simpler control flow.

Traps from Kernel Space

When the CPU is executing kernel code, stvec points directly to the kernelvec assembly code. Because the trap originates in supervisor mode, the satp register is already pointing to the kernel page table, and the stack pointer is already set to a valid kernel stack.

kernelvec pushes all 32 registers directly onto the current kernel stack, safely preserving the state of the interrupted kernel thread.
Execution jumps to the kerneltrap C function.
kerneltrap handles device interrupts (devintr) or triggers a kernel panic if an exception occurs, as kernel exceptions are always fatal errors.
If the trap is a timer interrupt and a process thread is active, kerneltrap invokes yield to allow other threads CPU time.
Because yield may switch threads and overwrite sepc and sstatus, kerneltrap securely saves and restores these hardware registers locally.
Control returns to kernelvec, which pops the registers off the stack and executes sret to resume the interrupted kernel code.

Beyond routing execution and managing hardware interrupts, the exception mechanism—specifically page faults—provides the foundation for advanced, dynamic memory management techniques.

Page-Fault Exceptions

Page faults trigger an exception when a memory access encounters an invalid mapping, a cleared PTE_V (valid) flag, or violates permission bits like PTE_W (write) or PTE_R (read). During a page fault, the hardware populates scause with the fault type (load, store, or instruction) and stval with the specific virtual address that failed translation. By manipulating page table permissions and responding dynamically to these faults, the operating system can implement sophisticated memory optimizations:

Copy-on-write (COW) fork: fork operations initially map the parent’s physical pages into the child’s address space with read-only permissions. If either process attempts to write, a store page fault is raised. The trap handler allocates a new physical page, copies the original data, updates the faulting page table entry to allow writes, and re-executes the instruction. This prevents expensive memory copying if a fork is immediately followed by an exec.
Lazy allocation: When a process requests memory via sbrk, the kernel increases the recorded size but defers physical allocation and page table updates. The first attempt to use the unallocated memory triggers a page fault, prompting the kernel to allocate and map the page precisely when needed, avoiding waste for over-provisioned memory requests.
Demand paging: Executable loading eagerly builds page tables but marks the entries as invalid without pulling data from disk. As the application executes, instruction and data page faults dynamically load the required pages from storage, vastly reducing application startup latency.
Paging to disk: When physical RAM is exhausted, the kernel evicts active pages to a secondary disk storage area, marking their page table entries as invalid. Subsequent access attempts trigger a page fault, forcing the kernel to allocate RAM, retrieve the data from disk, and restore the mapping.

My Knowledge Base

Explorer

04 Traps and system calls