Page Tables and Virtual Memory

Paging Hardware and Address Translation

  • RISC-V instructions manipulate virtual addresses, while the machine’s RAM uses physical addresses.
  • The Sv39 RISC-V architecture utilizes only the bottom 39 bits of a 64-bit virtual address, ignoring the top 25 bits.
  • The page table structure physically maps these addresses:
    • Logically acts as an array of Page Table Entries (PTEs).
    • Each PTE translates a virtual address to a physical address at the granularity of a 4096-byte ( bytes) page.
    • A PTE contains a 44-bit Physical Page Number (PPN) and hardware control flags.
    • The CPU constructs a 56-bit physical address by combining the 44-bit PPN from the PTE with the bottom 12 bits of the original virtual address.
  • Three-level tree implementation:
    • A page table is stored in physical memory as a three-level tree of 4096-byte pages.
    • The root page contains 512 PTEs pointing to intermediate pages, which point to bottom-level pages containing the final physical mappings.
    • The 27-bit virtual page number is split into three 9-bit sections to index into each of the three levels.
    • This tree structure saves physical memory by omitting entirely unmapped intermediate and bottom-level page directories.
  • Hardware integration:
    • The Translation Look-aside Buffer (TLB) caches PTEs inside the CPU to eliminate the performance cost of loading PTEs from memory during every address translation.
    • The satp register holds the physical address of the root page-table page, telling the CPU which page table tree to use for the currently executing thread.
  • PTE flags control access permissions:
    • PTE_V: Indicates the PTE is present and valid.
    • PTE_R, PTE_W, PTE_X: Control read, write, and execute permissions, respectively.
    • PTE_U: Allows access by instructions executing in user mode.

To manage this hardware translation, the operating system constructs dedicated page tables to map distinct regions of memory, starting with its own address space.

Kernel Address Space

  • The kernel maintains a single, globally shared page table to access physical memory and memory-mapped hardware registers.
  • Direct mapping architecture:
    • Most physical memory and device registers are mapped at virtual addresses exactly equal to their physical addresses.
    • The kernel binary is located at KERNBASE (0x80000000) in both virtual and physical memory spaces.
    • RAM extends linearly from KERNBASE up to a defined limit called PHYSTOP.
  • Exceptions to direct mapping:
    • Trampoline page: Mapped at the very top of the virtual address space, containing code to transition the CPU in and out of the kernel. This physical page is mapped twice: once direct-mapped in the kernel and once at the highest virtual address.
    • Kernel stacks: Each process has a private kernel stack mapped high in memory, immediately above an unmapped guard page.
    • The guard page has an invalid PTE (PTE_V is cleared), triggering a hardware exception if a stack overflow occurs, which prevents the corruption of adjacent kernel memory.

Building these kernel and user mappings requires specific software routines to walk the hardware-defined page table tree and update the entries.

Page Table Management Code

  • The central data structure for software page table manipulation is pagetable_t, a C pointer to a RISC-V root page-table page.
  • Core virtual memory lookup functions:
    • walk: Mimics the hardware’s 3-level traversal, using 9 bits at a time to descend the tree and return the address of the lowest-level PTE. It can dynamically allocate intermediate pages if requested during the traversal.
    • mappages: Installs PTEs for a virtual-to-physical address range by calling walk for each page interval and configuring the PPN and permission flags.
  • Kernel initialization routines:
    • kvminit creates the kernel page table during early boot, mapping the kernel instructions, data, physical memory up to PHYSTOP, and device memory.
    • kvminithart writes the root page table physical address into the CPU’s satp register to enable hardware address translation.
    • The sfence.vma instruction is executed immediately after satp is modified to flush the CPU’s TLB, preventing stale cached mappings from causing invalid memory accesses.

Backing these virtual mappings requires an allocator to supply and reclaim the actual physical memory pages.

Physical Memory Allocation

  • The kernel manages physical memory between the end of the kernel binary and PHYSTOP as a global pool for run-time allocation.
  • Memory is allocated and freed strictly in 4096-byte page increments.
  • Free pages are tracked using a linked list threaded directly through the available memory pages themselves.
  • Allocator implementation:
    • Each free page stores a struct run structure containing a pointer to the next free page.
    • The kfree function fills freed memory with the garbage value 1 to expose dangling references quickly, then prepends the page to the free list.
    • The kalloc function removes and returns the first element from the free list when memory is requested.
    • The free list structure is protected by a spin lock to handle concurrent allocation requests across multiple CPUs.

By combining physical memory allocation with page table construction, the kernel can construct isolated, dynamic memory environments for each executing program.

Process Address Space and Memory Growth

  • Each process possesses an independent page table, dictating a private address space that maps contiguous virtual addresses starting at zero to potentially non-contiguous physical pages.
  • Address space layout:
    • Grows upwards to MAXVA, addressing up to 256 Gigabytes of virtual memory.
    • Ordered sequentially from zero: user instructions, global variables, user stack, and an expandable heap.
    • The trampoline page is mapped at the top of the user address space to facilitate kernel transitions.
    • An inaccessible guard page (PTE_U flag cleared) sits directly below the user stack to catch stack overflows via hardware page-fault exceptions.
  • Dynamic memory allocation (sbrk):
    • The sbrk system call shrinks or grows a process’s memory.
    • growproc invokes uvmalloc to acquire new physical pages via kalloc and maps them using mappages.
    • uvmdealloc removes memory by calling uvmunmap, which utilizes walk to locate PTEs and passes the associated physical addresses back to kfree.
    • The user page table serves as the definitive kernel record of which physical pages are allocated to a process.

The exact layout of a process’s memory is initially populated by parsing compiled binaries from the file system.

Exec and ELF Binary Loading

  • The exec system call replaces an address space’s existing memory image with a new executable stored in the Executable and Linkable Format (ELF).
  • Initialization and parsing steps:
    • Validates the file via a 4-byte magic number (0x7F 'E' 'L' 'F').
    • Allocates a blank page table via proc_pagetable.
    • Parses ELF program section headers (struct proghdr) to determine memory sizing and block alignments.
    • Allocates contiguous virtual memory per segment with uvmalloc and populates the pages directly from the file via loadseg.
  • Stack setup:
    • Allocates a single stack page and a protective inaccessible guard page.
    • Copies command-line argument strings and pointers to the top of the newly allocated stack, preparing argc and argv for the program’s main function.
  • Security and commitment:
    • Verifies that segment virtual addresses and sizes do not mathematically overflow a 64-bit integer, preventing malicious binaries from tricking the kernel into mapping data over kernel space.
    • Retains the old address space until the entire new image is successfully built. If an error occurs during parsing or allocation, the partial new image is freed and exec returns an error, safely preserving the original process state.

Would you like me to generate a set of flashcards to help you review and memorize these virtual memory concepts and RISC-V paging structures?