An operating system manages and abstracts low-level hardware, shares physical resources among multiple programs, and provides controlled ways for programs to interact.

  • Kernel: A special privileged program that provides core services to running programs.
  • Process: A running program consisting of memory (instructions, data, and a stack) and private state managed by the kernel.
  • System Call: A defined entry point in the operating system’s interface that transitions execution from user space to kernel space to perform privileged operations.
  • Hardware Protection: The kernel utilizes CPU mechanisms to ensure processes access only their own memory and execute without hardware privileges.
System callDescription
forkCreate a process, return child’s PID.
exitTerminate the current process; status is reported to wait(). No return.
waitWait for a child to exit; exit status in *status; returns child PID.
killTerminate process PID. Returns 0, or -1 for error.
getpidReturn the current process’s PID.
pausePause for n clock ticks.
uptimeReturn how many clock ticks have occurred since boot.
execLoad a file and execute it with arguments; only returns on error.
sbrkGrow process memory by n bytes. Returns start of new memory.
openOpen a file; flags indicate read/write; returns a file descriptor.
writeWrite n bytes from buf to file descriptor fd; returns n.
readRead n bytes into buf; returns number read, or 0 at end of file.
closeRelease open file descriptor fd.
dupReturn a new file descriptor referring to the same file as fd.
pipeCreate a pipe, placing read/write file descriptors in p[0] and p[1].
chdirChange the current directory.
mkdirCreate a new directory.
mknodCreate a device file.
fstatPlace info about an open file into *st.
linkCreate another name (file2) for the file file1.
unlinkRemove a file.

Processes

The operating system time-shares hardware by transparently switching available CPUs among waiting processes, saving and restoring CPU registers during transitions.

  • Process Identifier (PID): A unique integer the kernel associates with each process.
  • Process Creation:
    • fork() creates a new child process by exactly duplicating the parent’s memory contents.
    • fork() returns in the child process and the child’s PID in the parent process.
    • The parent and child execute independently with different memory spaces and registers; changes in one do not affect the other.
  • Process Execution:
    • exec(file, argv) replaces the calling process’s memory with a new memory image loaded from a file (structured in the ELF format).
    • exec() takes an executable filename and an array of string arguments, starting execution at the binary’s declared entry point without returning to the calling program.
  • Process Termination and Synchronization:
    • exit(status) stops the calling process and releases resources like memory and open files. A status of conventionally indicates success, while indicates failure.
    • wait(*status) pauses the calling process until a child exits, returning the child’s PID and copying its exit status into the provided address.
  • Memory Management:
    • Most user-space memory is allocated implicitly during fork() and exec().
    • sbrk(n) grows a process’s data memory by bytes dynamically at run-time and returns the location of the new memory.

File

A file descriptor is a small integer acting as an index into a per-process table, representing a kernel-managed object such as a file, directory, device, or pipe.

  • Standard Conventions: By default, processes read from file descriptor (standard input), write to (standard output), and write errors to (standard error).
  • Core I/O Operations:
    • read(fd, buf, n) reads up to bytes from into , advancing the file offset by the number of bytes read. It returns to indicate the end of the file.
    • write(fd, buf, n) writes bytes from to , advancing the file offset sequentially.
    • close(fd) releases a file descriptor for future reuse. Newly allocated file descriptors always use the lowest-numbered unused integer for the current process.
  • I/O Redirection:
    • fork() copies the parent’s file descriptor table to the child, granting the child the exact same open files.
    • exec() replaces the process memory but completely preserves the file table.
    • A shell redirects I/O by forking a child, closing standard file descriptors, opening specific files to claim those low-numbered descriptors, and then calling exec() to run the new program.
  • Offset Sharing:
    • Underlying file offsets are shared between file descriptors only if they were derived from the same original descriptor via fork() or dup().
    • dup(fd) duplicates an existing descriptor, returning a new one that refers to the same underlying I/O object and shares its offset.

Pipes

A pipe is a small kernel buffer exposed to processes as a pair of file descriptors: one for reading and one for writing.

  • Creation: pipe(p) creates the buffer and records the read descriptor in and the write descriptor in .
  • Communication Flow:
    • Writing data to the write end makes it available for reading at the read end.
    • If no data is available, a read operation blocks until data is written or until all file descriptors referring to the write end are closed.
    • If all write ends are closed, read() returns , simulating an end-of-file. This requires processes to rigorously close unused write descriptors to prevent readers from waiting indefinitely.
  • Advantages Over Temporary Files:
    • Pipes automatically clean themselves up, whereas temporary files require explicit deletion.
    • Pipes can pass arbitrarily long streams of data without being constrained by disk space.
    • Pipes allow parallel execution of pipeline stages, unlike files which require the first program to finish before the second starts.
    • Blocking reads and writes in pipes are significantly more efficient than non-blocking file semantics for inter-process communication.

File system

The file system provides data files (uninterpreted byte arrays) and directories (named references to files and other directories), structured as a tree originating from a root directory.

  • Path Resolution:
    • Paths beginning with / are evaluated from the root directory.
    • Paths not beginning with / are evaluated relative to the calling process’s current directory, which can be modified using chdir(dir).
  • Inodes and Links:
    • Inode: The underlying physical file object that holds file metadata, including type (file, directory, or device), length, disk location, and the number of links.
    • Link: An entry in a directory containing a filename and a reference to an inode.
    • A single inode can have multiple links (names) pointing to it.
  • File System Operations:
    • mkdir(dir) creates a new directory.
    • open(file, O_CREATE) creates a new data file.
    • mknod(file, major, minor) creates a special device file that diverts I/O system calls directly to a kernel device implementation identified by major and minor numbers.
    • link(file1, file2) creates a new name (file2) referring to the exact same inode as an existing file (file1).
    • unlink(file) removes a name from the file system. The underlying inode and disk space are only freed when the file’s link count drops to and no active file descriptors refer to it.
    • fstat(fd, *st) and stat(file, *st) retrieve inode information into a struct stat object.

User mode, supervisor mode and system calls

  • CPUs provide hardware execution modes to establish a hard boundary between application code and the operating system.
  • Machine Mode: Starts upon CPU boot, executes with full hardware privilege, and is strictly used for low-level computer configuration.
  • Supervisor Mode: Allows execution of privileged instructions necessary for OS operations, such as enabling interrupts or writing to page table registers. Software running in this mode is the kernel, executing in kernel space.
  • User Mode: Restricts execution to unprivileged instructions. Applications execute in this mode within user space.
  • If a user-mode application attempts a privileged instruction, the CPU suppresses the instruction and forcefully switches to supervisor mode so the kernel can terminate the application.
  • Applications invoke kernel services via system calls using specialized instructions (e.g., the RISC-V ecall instruction).
  • System calls switch the CPU to supervisor mode at a strictly kernel-defined entry point, preventing malicious applications from bypassing argument validation or access control checks.

Virtualization

  • A process is the fundamental unit of isolation, shielding an application’s memory, CPU state, and file descriptors from interference by other processes.
  • A process bundles two foundational architectural illusions:
    • Private Address Space: Simulates private physical memory using hardware page tables.
      • RISC-V page tables translate virtual addresses utilized by instructions into physical addresses on the RAM chip.
      • The layout begins at virtual address zero with instructions, global variables, the stack, and the heap.
      • The address space is bounded by hardware translation limits; xv6 uses 38 bits of addressable space, establishing a maximum virtua l address of (MAXVA).
      • The top pages of the address space are reserved for a trampoline page (managing user/kernel transitions) and a trapframe page (saving user state).
    • Private CPU (Thread): Simulates dedicated processor execution.
      • Each process contains a thread of execution that tracks local variables and return addresses on stacks.
      • A process actively alternates between two stacks: a user stack for user-space computation, and a kernel stack used exclusively during system calls and interrupts.
      • The kernel stack is protected from user-space access to ensure the kernel can execute safely even if the user stack is compromised.
  • Kernel state for each process is centralized in a proc structure, containing references to the process’s page table (p->pagetable), kernel stack (p->kstack), and run state (p->state).
  • During a system call, hardware elevates the privilege level, switches the program counter to the kernel entry point, executes on the kernel stack, and subsequently utilizes the sret instruction to lower privileges and resume the user thread.

Layout of a process’s virtual space:

Codebase

The xv6 codebase is organised into four distinct parts:

OrderPartRole
1MakefileCoordinates the build and launch flow.
2kernel/Builds the xv6 OS kernel.
3user/Builds xv6 user programs.
4mkfs/Builds fs.img using the xv6 user programs.
5QEMUBoots the kernel with fs.img as the disk.

Kernel Subsystems

The kernel is easier to read by subsystem rather than as one flat list of files.

1 Headers and Utilities

FileKindPurpose
kernel/types.hHeaderBasic integer/type aliases.
kernel/param.hHeaderKernel-wide size limits.
kernel/memlayout.hHeaderPhysical/virtual memory map.
kernel/riscv.hHeaderRISC-V registers, paging, and interrupt helpers.
kernel/defs.hHeaderCross-file kernel declarations.
kernel/kernel.ldLinker scriptKernel memory layout at link time.

2 Boot Sequence

FileKindPurpose
kernel/entry.SAssemblyFirst code after QEMU jumps to the kernel.
kernel/start.cCEarly CPU setup before main.
kernel/main.cCKernel initialization order.

3 Memory Subsystem

FileKindPurpose
kernel/kalloc.cCPhysical page allocator.
kernel/vm.hHeadersbrk allocation mode constants.
kernel/vm.cCPage tables and virtual memory.
kernel/string.cCBasic memory/string helpers.

4 Process Subsystem

FileKindPurpose
kernel/proc.hHeaderProcess, CPU, trapframe, and context structures.
kernel/proc.cCProcesses, scheduling, sleep/wakeup, and wait/exit.
kernel/swtch.SAssemblyLow-level context switch.
kernel/elf.hHeaderELF executable file format.
kernel/exec.cCLoad and run user programs.

5 Synchronization Mechanism

FileKindPurpose
kernel/spinlock.hHeaderSpinlock structure.
kernel/spinlock.cCShort critical-section locking.
kernel/sleeplock.hHeaderSleeping lock structure.
kernel/sleeplock.cCLocks that sleep while waiting.

6 Traps

FileKindPurpose
kernel/trampoline.SAssemblyUser/kernel trap transition code.
kernel/kernelvec.SAssemblyKernel-mode trap vector.
kernel/trap.cCTrap, syscall, timer, and interrupt handling.

7 PLIC

FileKindPurpose
kernel/plic.cCExternal interrupt controller setup and interrupt claiming.

8 UART Device Driver

FileKindPurpose
kernel/uart.cCLow-level serial device driver.
kernel/console.cCConsole input/output layer on top of UART.
kernel/printf.cCKernel printing and panic output.

9 Virtio Disk Driver

FileKindPurpose
kernel/virtio.hHeaderVirtio disk protocol definitions.
kernel/virtio_disk.cCVirtual disk driver.

10 Filesystem

FileKindPurpose
kernel/buf.hHeaderDisk buffer structure.
kernel/bio.cCBuffer cache and LRU block reuse.
kernel/fs.hHeaderOn-disk filesystem format.
kernel/log.cCFilesystem transaction log.
kernel/fs.cCInodes, directories, path lookup, and inode I/O.

11 File Layer (VFS)

FileKindPurpose
kernel/file.hHeaderIn-memory file, inode, pipe, and device structs.
kernel/file.cCOpen-file table and file operations.
kernel/fcntl.hHeaderFile open flags.
kernel/stat.hHeaderFile metadata structure.
kernel/pipe.cCPipes for process communication.

12 Syscall Connectors and Wrappers

FileKindPurpose
kernel/syscall.hHeaderSystem call number definitions.
kernel/syscall.cCSystem call dispatch and argument fetching.
kernel/sysproc.cCProcess-related system calls.
kernel/sysfile.cCFile-related system calls.

User-Space Runtime

OrderFileKindPurpose
1user/user.hHeaderUser-visible syscall and library declarations.
2user/usys.plGeneratorGenerates user syscall stubs.
3user/usys.SGenerated assemblyUser-side syscall wrappers using ecall.
4user/ulib.cUser libraryBasic user-space helper functions.
5user/printf.cUser libraryUser-space formatted printing.
6user/umalloc.cUser librarySimple user-space memory allocator.

User Programs

OrderFileKindPurpose
1user/init.cUser programFirst user process.
2user/sh.cUser programxv6 shell.
3user/ls.cUser programList directory contents.
4user/cat.cUser programPrint file contents.
5user/echo.cUser programPrint arguments.
6user/grep.cUser programSearch text.
7user/wc.cUser programCount lines, words, bytes.
8user/mkdir.cUser programCreate directories.
9user/rm.cUser programRemove files.
10user/ln.cUser programCreate hard links.
11user/kill.cUser programKill a process.
12user/stressfs.cTest programStress filesystem behavior.
13user/forktest.cTest programStress process creation.
14user/grind.cTest programStress processes/filesystem/concurrency.
15user/usertests.cTest programBroad xv6 test suite.

mkfs and Filesystem Image

mkfs is a host-side tool that runs on the build machine before xv6 boots. It packs compiled user binaries into fs.img, the virtual disk QEMU presents to xv6.

OrderFile / ArtifactKindPurpose
1kernel/fs.hShared format headerDefines xv6 on-disk filesystem layout.
2mkfs/mkfs.cHost toolCreates fs.img using xv6 filesystem format.
3user/_init etc.RISC-V binariesCompiled user programs inserted into fs.img.
4fs.imgDisk imageVirtual disk passed to xv6 by QEMU.

Full Build-to-Boot Pipeline

OrderStepRuns where?Purpose
1MakefileHostCoordinates the build.
2Build kernel filesHost cross-compilerProduces kernel/kernel.
3Build user support filesHost cross-compilerProduces user runtime objects.
4Build user programsHost cross-compilerProduces user/_init, user/_sh, etc.
5Build mkfs/mkfs.cHost compilerProduces host executable mkfs/mkfs.
6Run mkfs/mkfsHostPacks user binaries into fs.img.
7Start QEMUHostCreates virtual RISC-V machine.
8Run kernel/kernelQEMU/RISC-VBoots xv6 kernel.
9Run /initxv6 user modeStarts first user process.
10Run /shxv6 user modeStarts shell.

Runtime flow:

OrderPartRole
1QEMUStarts the virtual RISC-V machine.
2kernel/entry.SSets up the first kernel stack.
3kernel/start.cSwitches from machine mode to supervisor.
4kernel/main.cInitializes kernel subsystems.
5Disk and filesystem initMakes fs.img available through xv6.
6kernel/exec.c loads /initLoads the first user program.
7user/init.c starts /shOpens the console and starts the shell.
8user/sh.c runs commandsReads and executes user commands.

Makefile

The Makefile coordinates three separate builds and then launches QEMU.

Build Pipeline

StepInputLinker ScriptOutput
Kernel buildkernel/*.c + kernel/*.Skernel/kernel.ldkernel/kernel
User builduser/*.c + usys.Suser/user.lduser/_init etc.
mkfs (Host)mkfs/mkfs.cmkfs/mkfs
Filesystem imagemkfs/mkfs + user/_init etc.fs.img
Bootkernel/kernel + fs.imgQEMU launches xv6

Linking all kernel object files with kernel.ld produces three files:

FileContent
kernel/kernelLinked kernel binary loaded by QEMU
kernel/kernel.asmMixed source/disassembly for inspection
kernel/kernel.symAddress-to-symbol map for debugging

Every user program links against a small runtime library:

ObjectSourceRole
ulib.oulib.cString helpers and syscall wrappers
usys.ogenerated usys.SSyscall stubs (ecall wrappers)
printf.oprintf.cUser-space printf
umalloc.oumalloc.cUser-space malloc/free

Notes:

  • usys.S is generated by running usys.pl through Perl where each stub loads the syscall number into a7 and executes ecall.
  • User programs are named with a leading underscore (user/_init) to avoid clashing with host tools. mkfs strips it when packing into fs.img.
  • forktest omits printf.o and umalloc.o to stay small enough to max out the process table.

These are the compiled user programs packed into fs.img by mkfs:

ProgramPurpose
_initFirst user process started by the kernel
_shxv6 shell
_lsList directory contents
_catPrint file contents
_echoPrint arguments
_grepSearch text
_wcCount lines, words, bytes
_mkdirCreate directories
_rmRemove files
_lnCreate hard links
_killKill a process
_zombieDemonstrate zombie process behavior
_forktestStress process creation
_stressfsStress filesystem writes
_usertestsBroad xv6 test suite
_grindStress processes, filesystem, and concurrency
_logstressStress filesystem logging
_forphan / _dorphanTest orphaned process behavior

Toolchain

ToolRole
$(CC)Compiles C and preprocessed .S assembly
$(LD)Links object files into binaries
$(OBJDUMP)Generates .asm and .sym files for inspection
gcc (host)Compiles mkfs/mkfs.c — runs on build machine, not RISC-V

Compiler Flags

FlagPurpose
-Wall -WerrorWarnings as errors
-OBasic optimization
-ggdbGDB-friendly debug info
-gdwarf-2DWARF v2 debug format
-fno-omit-frame-pointerKeep frame pointers for stack traces
-march=rv64gcTarget 64-bit RISC-V with standard extensions
-mcmodel=medanyAddressing for code linked at 0x80000000, not near zero
-MDEmit .d dependency files for incremental builds
-ffreestandingNo hosted C environment assumptions
-nostdlibDo not link standard library or startup files
-fno-commonCatch duplicate global definitions at link time
-fno-builtin-*Prevent GCC substituting xv6’s own memcpy, printf etc. with libc versions
-fno-stack-protectorNo stack canary — kernel has no runtime support for it
-fno-pie -no-pieFixed-address binaries; xv6 does not use position-independent code
-I.Include headers relative to project root

Linker Flags

FlagPurpose
-z max-page-size=4096Align ELF segments to 4 KiB, matching xv6’s page size.

QEMU Launch

SpecValue
Machinevirt (generic RISC-V virtual board)
CPUriscv64
Cores3
RAM128M
Kernelkernel/kernel
Diskfs.img via virtio-blk
DisplayNone (-nographic, serial console only)

The Linker Script

The kernel/kernel.ld tells the linker how to lay out the final kernel binary in memory. It defines where code goes, in what order, and what symbols the rest of the kernel can use to find section boundaries.

Memory Layout

The linker arranges the kernel into sections in this order, starting at 0x80000000:

SectionContentsNotes
.textKernel code_entry placed first, then all other code
.rodataRead-only dataConstants, string literals, aligned to 16 bytes
.dataInitialized globalsNon-zero globals, stored in the binary
.bssZero-initialized globalsNot stored in binary; kernel zeroes at startup

.text:

  • Executable code only, read and execute permissions.
  • _entry lands at exactly 0x80000000.
  • Trampoline is carved out at the end, aligned to a 4 KiB page boundary
  • user-kernel transitions and must be mapped at the same virtual address in every page table.
  • trampoline.S declares .section trampsec, which the linker places here.

.rodata:

  • Read-only.
  • Aligned to 16 bytes.
  • Contains string literals and constant arrays that must not be modified at runtime.
  • Separate from .data so the OS can enforce read-only page permissions.

.data:

  • Read-write.
  • Aligned to 16 bytes.
  • Holds initialized globals with non-zero starting values, embedded in the binary and copied into RAM at load time.

.bss:

  • Read-write.
  • Aligned to 16 bytes.
  • Holds zero-initialized globals.
  • Not stored in the binary: the linker records only the size and the kernel zeroes the region at startup.

Note:

The 16-byte alignment across data sections ensures efficient memory access. On a 64-bit RISCV system, 16 bytes covers two 64-bit registers, which compilers exploit for multi-word loads and stores. Unaligned access can cause hardware exceptions or significant slowdowns on some architectures.

Symbols

The linker script exports two symbols the kernel uses at runtime:

SymbolMeaning
etextAddress of the end of the text section
endAddress of the end of the entire kernel image
_trampolineStart address of the trampoline page

The .rodata

Size: 2,080 bytes.

Contents:

  • Lock debug names: passed to initlock() and initsleeplock() at startup.
  • Panic and error messages: invariant violation and error detection strings across the kernel.
  • Boot messages: printed during initialization and secondary hart startup.
  • Format strings: used in trap handling and process dumps.
  • Path strings: "/" and "/init" used during first process setup.
  • Compiler-promoted immutable tables: digit lookup, process state names, syscall dispatch table.

Note: No const globals exist in xv6 everything here was placed by the compiler.

Permissions: xv6 maps this range as PTE_R | PTE_W in the kernel page table, so read-only is by convention only.

The .data

Size: 24 bytes almost nothing.

Contents:

  • Thenextpid = 1 in proc.c is the only meaningful non-zero global.
  • A static local first in kalloc.c to detect the first kfree call.

Why so small: most kernel state is zero-initialized and lives in .bss; string data lives in .rodata.

The .bss

Size: 103,224 bytes the bulk of xv6’s kernel state.

Contents: all global structs declared without explicit initializers. Not stored in the binary; the kernel zeroes this region at startup.

SymbolSizeFirst used byPurpose
stack032 KiBentry.S / start.cBoot stack for all CPUs before main
cons168 Bconsoleinit()Console input buffer and lock
tx_lock24 Bconsoleinit()UART transmit spinlock
tx_chan4 Bconsoleinit()UART transmit sleep channel
tx_busy4 Bconsoleinit()UART transmit busy flag
pr24 Bprintfinit()printf serialization lock
panicked4 Bprintfinit()Flag set when kernel has panicked
panicking4 Bprintfinit()Flag set while panic is in progress
kmem32 Bkinit()Physical page free list and its lock
kernel_pagetable8 Bkvminit()Pointer to the kernel page table
proc23 KiBprocinit()Process table 64 process slots
wait_lock24 Bprocinit()Condition lock for parent/child wait coordination
pid_lock24 Bprocinit()Protects nextpid counter
tickslock24 Btrapinit()Serializes access to the ticks counter
ticks4 Btrapinit()Wall-clock tick counter incremented by timer interrupts
bcache34 KiBbinit()Buffer cache 30 LRU disk block buffers
itable6.7 KiBiinit()In-memory inode table
ftable4 KiBfileinit()Open file table shared across all processes
devsw160 Bfileinit()Device switch table mapping major numbers to read/write handlers
disk320 Bvirtio_disk_init()Virtio disk driver state and descriptor ring
initproc8 Buserinit()Pointer to the init process
sb32 Bfsinit()Superblock filesystem geometry and metadata
log168 Binitlog()Filesystem transaction log state
cpus1 KiBscheduler()Per-CPU state current process, scheduler context, lock depth
started4 BAfter CPU 0 initSignals secondary CPUs that CPU 0 has finished initialization

The Free Memory

Starts at end (linker symbol at the end of .bss) and extends to PHYSTOP (0x88000000).

  • Not part of the kernel binary — raw physical RAM managed at runtime.
  • kalloc.c tracks it as a free list of 4 KiB pages.
  • kinit() walks from end to PHYSTOP and adds each page to the free list.
  • kalloc() pops a page from the free list; kfree() pushes one back.
Allocationkalloc() sitekfree() sitePagesWhat gets a page
Kernel page tablekvmmake()Never1021 root + 3 L1 + 98 L0 nodes for the Sv39 tree
Kernel stacksproc_mapstacks()Never64One per process slot, called inside kvmmake()
Virtio ringsvirtio_disk_init()Never3Descriptor ring, available ring, used ring for disk DMA
Trapframeallocproc()freeproc()1 per processPer-process trap register save area
User page tableproc_pagetable()proc_freepagetable()1 per processRoot page table, called inside allocproc()
User memoryuvmalloc()uvmdealloc() / uvmfree()variesUser process address space pages during exec and sbrk
Intermediate page table nodeswalk() with alloc=1freewalk()variesL1 and L0 pages as user page tables are built
Fork copyuvmcopy()uvmfree() on child exit1 per mapped pagePhysical page copies for child during fork
Page faultvmfault()uvmfree() on process exit1 per faultOn-demand page on fault
Pipe bufferpipealloc()pipeclose()1 per pipePipe kernel buffer
exec argvsys_exec()After copy / on errorvariesTemporary argument string pages during exec