06 kvminit

kvminit is one line:

void kvminit(void) {
  kernel_pagetable = kvmmake();
}

It calls kvmmake and stores the result in a global variable. The real work is in kvmmake.

What kvmmake does

It builds the kernel’s page table — the data structure that tells the MMU how to translate virtual addresses to physical addresses for all kernel code. It starts by allocating one page with kalloc for the root page table and zeroing it. Then it creates mappings one by one.

The I/O device mappings

First it maps the hardware devices. The UART at UART0 (0x10000000), the virtio disk at VIRTIO0 (0x10001000), and the PLIC at PLIC (0x0C000000). These are identity mappings — virtual address equals physical address. The permissions are read and write only, no execute. You don’t want the CPU accidentally jumping into UART registers and trying to run them as instructions.

The kernel code mapping

Then it maps the kernel’s own code from KERNBASE (0x80000000) up to etext — the symbol from the linker script marking where executable code ends. This is identity-mapped with read and execute permissions, but not write. The kernel’s own code is immutable at runtime. If a bug tries to write to kernel text, the MMU traps it.

The kernel data mapping

Everything from etext to PHYSTOP (0x88000000) gets mapped with read and write, but no execute. This covers .rodata, .data, .bss, and all the free physical memory that kalloc manages. No execute permission means a buffer overflow that tries to jump into kernel data will fault instead of running attacker code.

The trampoline mapping

This is the one non-identity mapping. The trampoline code lives somewhere in the kernel’s .text section physically (at the address trampoline points to), but it gets mapped at TRAMPOLINE — the very top of the virtual address space. This is because every user process’s page table also maps the trampoline at the same virtual address. When a trap happens, the CPU jumps to the trampoline code, and it needs to be at the same virtual address regardless of whether you’re in the kernel’s page table or a user’s page table. That’s how the page table switch during a trap works without crashing.

The kernel stacks

Finally, proc_mapstacks allocates a page for each process’s kernel stack and maps them into the kernel page table. Each kernel stack has a guard page below it — an unmapped page that causes a fault if the stack overflows, instead of silently corrupting whatever is below it.

The key insight

Almost everything is identity-mapped — virtual address equals physical address. This is why the kernel keeps working the moment paging turns on. Before paging, the CPU accesses physical address 0x80001000. After paging, the CPU accesses virtual address 0x80001000, which the page table maps to physical address 0x80001000. Same result. The code doesn’t notice the difference. The only exception is the trampoline, which deliberately lives at a different virtual address than its physical location.

Then kvminithart turns it on

After kvmmake returns the page table, kvminithart writes its physical address into satp and flushes the TLB with sfence_vma. From that moment on, every memory access on that hart goes through translation. But because of the identity mapping, nothing breaks — everything is still at the same address it was before.

kvm init hart

It’s three lines:

void kvminithart() {
  sfence_vma();
  w_satp(MAKE_SATP(kernel_pagetable));
  sfence_vma();
}

This is the moment paging turns on. Before this function, every memory access goes directly to physical RAM. After it, every memory access goes through the MMU and the page table that kvminit just built.

The first sfence_vma()

This is a fence — it guarantees that all the writes kvmmake did while building the page table are actually visible in memory. The CPU might have those writes sitting in a store buffer or write queue that hasn’t drained yet. If you flip on paging before those writes land, the MMU might walk a half-built page table and read garbage. The fence says “finish all pending memory operations before I proceed.”

w_satp(MAKE_SATP(kernel_pagetable))

This writes to the satp register. MAKE_SATP takes the physical address of the root page table and packs it with the mode field set to Sv39 (mode 8 — three-level, 39-bit virtual addresses). The moment this instruction executes, the MMU is active. The very next instruction fetch goes through address translation.

This doesn’t crash because kvmmake set up identity mappings — the virtual address the CPU is currently executing at maps to the same physical address it was already using. The program counter doesn’t need to change. The instruction stream continues seamlessly.

The second sfence_vma()

This flushes the TLB. The TLB might have cached stale translations from before paging was enabled (or from a previous satp value if this isn’t the first time). After writing satp, you want the MMU to use the new page table for everything, not serve old cached translations. The fence clears the TLB so every subsequent access does a fresh page table walk (which then gets cached for future accesses).

Why it’s called “inithart”

Because it’s per-hart. The page table itself (kernel_pagetable) is shared — all harts use the same one, built once by hart 0 in kvminit. But satp is a per-hart register. Each hart has its own satp, its own TLB, its own MMU. So every hart needs to run kvminithart individually to point its own hardware at the shared page table. That’s why in main(), hart 0 calls it after building the table, and the other harts call it after waking up — same function, same page table, but each hart is configuring its own hardware.

My Knowledge Base

Explorer

06 kvminit

kvm init hart