What qemu to use?

What toolkit to use?

cc: riscv compiler
gas: assembly code into object files
linker
objcopy - convert ELF into binary
objdump - inspect

These are the compiler flags used when building xv6 C code. They tell GCC:

compile strict C,
for 64-bit RISC-V,
without assuming a normal operating system or libc,
and keep debug info useful for kernel debugging.

Here is the breakdown.

Warning and debugging flags

Flag	Meaning	Why xv6 uses it
`-Wall`	Enable many common compiler warnings.	Catches suspicious code early.
`-Werror`	Treat warnings as errors.	Forces the kernel to build cleanly.
`-Wno-unknown-attributes`	Do not warn about unknown attributes.	Avoids toolchain-version noise.
`-O`	Enable basic optimization.	Produces reasonable code without aggressive optimization.
`-fno-omit-frame-pointer`	Keep frame pointer registers.	Makes stack traces and debugging easier.
`-ggdb`	Generate GDB-friendly debug info.	Helps debug xv6 with GDB.
`-gdwarf-2`	Use DWARF version 2 debug format.	Keeps debug info compatible/simple.

The debug-related ones matter a lot in xv6 because you often debug at the assembly/register level.

Target architecture

Flag	Meaning	Why xv6 uses it
`-march=rv64gc`	Generate code for 64-bit RISC-V with common extensions.	xv6-riscv runs on a 64-bit RISC-V machine.

rv64gc means:

rv64 = 64-bit RISC-V
g    = general-purpose extension set
c    = compressed instruction extension

The g group includes common extensions such as integer multiply/divide, atomics, and floating-point-related baseline extensions. xv6 mostly cares that this matches the QEMU RISC-V CPU/toolchain expectations.

Dependency generation

Flag	Meaning	Why xv6 uses it
`-MD`	Generate `.d` dependency files while compiling.	Lets `make` know which headers each `.o` depends on.

Example:

kernel/proc.c
  includes kernel/types.h
  includes kernel/param.h
  includes kernel/proc.h

With -MD, GCC emits a dependency file so that if proc.h changes, proc.o rebuilds automatically.

RISC-V code model

Flag	Meaning	Why xv6 uses it
`-mcmodel=medany`	Generate code that can run from a wider range of addresses.	xv6 is linked at `0x80000000`, not near address zero.

This one is important.

Normal code-generation assumptions may expect code/data to be reachable using certain address ranges. xv6’s kernel lives at a high physical address:

0x80000000

medany tells the compiler to generate address calculations suitable for code located in a medium-sized address range, not assuming everything is near zero.

Without the correct code model, generated RISC-V addressing sequences may not work correctly for the kernel’s link address.

Freestanding kernel environment

Flag	Meaning	Why xv6 uses it
`-ffreestanding`	Compile for a freestanding environment, not hosted C.	The kernel is the OS; there is no libc/normal runtime underneath.
`-nostdlib`	Do not link against standard libraries/startup files.	xv6 provides its own runtime, syscalls, printing, memory helpers, etc.

This is one of the biggest conceptual differences from normal C programs.

A normal C program is “hosted”:

program runs inside Linux/macOS
libc exists
startup code calls main()
malloc/printf/memcpy exist
OS provides services

xv6 kernel code is freestanding:

no libc
no normal program startup
no host OS underneath
no default malloc/printf/memcpy
kernel provides its own world

Global variable behavior

Flag	Meaning	Why xv6 uses it
`-fno-common`	Tentative global definitions become real definitions, not mergeable common symbols.	Catches accidental duplicate global variables.

This helps prevent bugs like putting this in a header:

int counter;

and including it in many .c files.

With stricter behavior, duplicate definitions are more likely to fail at link time instead of being silently merged.

Better style:

// header
extern int counter;
 
// one .c file
int counter;

For kernel code, this is good because accidental duplicate globals are nasty.

Disable compiler built-ins

These flags are all variations of the same idea:

-fno-builtin-strncpy
-fno-builtin-strncmp
-fno-builtin-strlen
-fno-builtin-memset
-fno-builtin-memmove
-fno-builtin-memcmp
-fno-builtin-log
-fno-builtin-bzero
-fno-builtin-strchr
-fno-builtin-exit
-fno-builtin-malloc
-fno-builtin-putc
-fno-builtin-free
-fno-builtin-memcpy
-fno-builtin-printf
-fno-builtin-fprintf
-fno-builtin-vprintf

They tell GCC:

Do not treat these names as special compiler-known library functions.
Use xv6’s definitions or normal calls instead.

Why?

Because GCC knows many standard C library functions by name. Even without including libc headers, the compiler may optimize calls to functions like memcpy, strlen, printf, or malloc based on assumptions from normal C environments.

But xv6 is not a normal C environment.

xv6 has its own implementations of things like:

memset
memmove
memcmp
strlen
printf
malloc/free in user space

The compiler must not silently replace or reinterpret these calls using hosted-libc assumptions.

Example problem:

memcpy(dst, src, n);

A normal compiler might think:

I know what memcpy means.
I can optimize this specially.
I may emit inline instructions or assume libc semantics.

xv6 says:

No. Treat memcpy as xv6’s function, not as a magical builtin.

So these flags prevent unwanted compiler cleverness.

`-Wno-main`

Flag	Meaning	Why xv6 uses it
`-Wno-main`	Do not warn that `main` has an unusual signature or usage.	Kernel/user startup code may not match normal hosted C expectations.

In normal C programs, main has expected signatures like:

int main(void)
int main(int argc, char **argv)

But in kernel or tiny user-runtime contexts, startup conventions can be different. xv6 disables this warning.

Include path

Flag	Meaning	Why xv6 uses it
`-I.`	Add current directory to header search path.	Allows includes relative to xv6 source root.

This lets code include headers from the project tree cleanly.

For example:

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

depending on which file is being compiled.

Stack protector check

CFLAGS += $(shell $(CC) -fno-stack-protector -E -x c /dev/null >/dev/null 2>&1 && echo -fno-stack-protector)

This means:

Ask the compiler: do you support -fno-stack-protector?
If yes, add -fno-stack-protector to CFLAGS.
If no, add nothing.

So the actual added flag is usually:

-fno-stack-protector

What is stack protector?

Modern compilers often add stack-smashing protection to functions. They insert hidden checks using a “stack canary.”

Normal compiled code might become:

function starts
  put secret canary on stack
 
function returns
  check canary was not overwritten
  if overwritten, call failure handler

That is useful in normal applications.

But in xv6, this creates a problem: the compiler may expect runtime support functions such as stack-check failure handlers. xv6 does not provide a normal libc/runtime environment.

So xv6 disables it.

Conceptually:

Do not insert hidden stack-protection runtime calls.
This is a tiny kernel with its own runtime.

Compact table

Flag	Short meaning
`-Wall`	Enable many warnings.
`-Werror`	Warnings become errors.
`-Wno-unknown-attributes`	Ignore unknown attribute warnings.
`-O`	Basic optimization.
`-fno-omit-frame-pointer`	Keep frame pointers for debugging.
`-ggdb`	GDB debug info.
`-gdwarf-2`	DWARF v2 debug format.
`-march=rv64gc`	Target 64-bit RISC-V.
`-MD`	Generate dependency files.
`-mcmodel=medany`	Addressing model suitable for high kernel address.
`-ffreestanding`	No hosted C environment assumptions.
`-fno-common`	Catch duplicate global definitions.
`-nostdlib`	Do not link standard library/runtime.
`-fno-builtin-*`	Do not treat libc names as compiler built-ins.
`-Wno-main`	Do not warn about nonstandard `main`.
`-I.`	Search current source tree for headers.
`-fno-stack-protector`	Avoid hidden stack canary runtime dependency.

The big picture:

These flags make GCC behave like a kernel compiler:
 
strict warnings,
RISC-V target,
debuggable output,
no libc assumptions,
no hidden runtime dependencies,
and address generation suitable for xv6’s memory layout.

Some GCC toolchains build position-independent executables by default. xv6 does not want that. If this compiler supports disabling PIE, add the right flags.

Compact summary

Piece	Meaning
`PIE`	Position Independent Executable
`-fno-pie`	Compiler: do not generate PIE-style code
`-no-pie`	Linker/driver: do not link as PIE
`-nopie`	Older/alternate spelling of `-no-pie`
`-dumpspecs`	Ask GCC what options/default specs it knows
`ifneq (...,)`	If shell command output is non-empty
Purpose	Keep xv6 fixed-address and simple

Final mental model:

Modern Linux GCC may default to PIE for security. xv6 needs fixed-address kernel/user binaries. So the Makefile detects whether the compiler supports disabling PIE and adds the right no-PIE flags.

Linker Flags now

LDFLAGS = -z max-page-size=4096

This is a linker flag. It tells the linker:

When laying out the final binary, use 4096 bytes as the maximum page size.

In xv6 terms:

4096 bytes = 0x1000 = one xv6/RISC-V page

Why does the linker care about page size?

When the linker builds an ELF file, it creates loadable segments such as:

text/code segment
read-only data segment
data segment
bss segment

ELF segments have alignment requirements. On some toolchains, the linker may choose a large default maximum page size, like:

2 MiB

That can make the linker insert huge padding/alignment gaps between parts of the kernel image.

For xv6, that is annoying or wrong because xv6 expects a simple 4 KiB page model.

So this flag says:

Do not align ELF segments using some huge default page size.
Use 4096-byte pages.

Why 4096?

Because xv6 uses 4 KiB pages:

#define PGSIZE 4096

So this linker flag matches the kernel’s memory/page-table model:

xv6 page size      = 4096 bytes
linker page size   = 4096 bytes
hardware page size = 4096 bytes

That keeps the kernel image layout compact and predictable.

What could happen without it?

Without this flag, the linker might create a kernel ELF where segments are aligned to a larger page boundary.

Conceptually:

.text
  ↓
huge padding gap
  ↓
.rodata
  ↓
huge padding gap
  ↓
.data

That can make the kernel image larger than expected or shift sections in ways that make the layout less clean.

For a tiny teaching kernel, xv6 wants:

code
then trampoline page
then rodata
then data
then bss

not:

code
then megabytes of linker padding
then rodata
then more padding
then data

What does `-z` mean?

-z passes a special option to the linker.

So:

-z max-page-size=4096

means:

Set the linker’s maximum page size to 4096.

This is not a C compiler behavior flag. It affects the link stage, when object files are combined into the final kernel binary.

Compact summary

Part	Meaning
`LDFLAGS`	Flags passed to the linker.
`-z`	Linker-specific option prefix.
`max-page-size=4096`	Use 4 KiB max page alignment for ELF segments.
Why xv6 wants it	xv6 uses 4 KiB pages and wants compact/predictable layout.

Mental model:

C/assembly files
  ↓ compile
object files
  ↓ link with LDFLAGS
kernel ELF laid out with 4 KiB page alignment

So this flag keeps the linker’s idea of page alignment consistent with xv6’s 4096-byte page world.

Next Ste4p

This is the Makefile rule that creates the final xv6 kernel binary.

$K/kernel: $(OBJS) $K/kernel.ld
	$(LD) $(LDFLAGS) -T $K/kernel.ld -o $K/kernel $(OBJS)
	$(OBJDUMP) -S $K/kernel > $K/kernel.asm
	$(OBJDUMP) -t $K/kernel | sed '1,/SYMBOL TABLE/d; s/ .* / /; /^$$/d' > $K/kernel.sym

Read it as:

To build kernel/kernel,
you need all kernel object files
and the kernel linker script.

First line: the rule header

$K/kernel: $(OBJS) $K/kernel.ld

Since:

K=kernel

this means:

kernel/kernel: $(OBJS) kernel/kernel.ld

So the target is:

kernel/kernel

That is the final linked kernel executable.

The dependencies are:

all kernel object files
kernel/kernel.ld

So if any .o file changes, or if kernel.ld changes, Make rebuilds kernel/kernel.

Conceptually:

kernel/*.o + kernel/kernel.ld
        ↓
kernel/kernel

Second line: link the kernel

$(LD) $(LDFLAGS) -T $K/kernel.ld -o $K/kernel $(OBJS)

Expands roughly to:

riscv64-unknown-elf-ld \
  -z max-page-size=4096 \
  -T kernel/kernel.ld \
  -o kernel/kernel \
  kernel/entry.o kernel/start.o kernel/console.o ...

This is the actual linking step.

Part	Meaning
`$(LD)`	The RISC-V linker.
`$(LDFLAGS)`	Linker flags, like `-z max-page-size=4096`.
`-T kernel/kernel.ld`	Use xv6’s linker script.
`-o kernel/kernel`	Output file name.
`$(OBJS)`	All compiled kernel object files.

The linker combines all kernel .o files into one kernel image.

It also uses kernel.ld to decide:

start at 0x80000000
put _entry first
lay out .text, trampoline, .rodata, .data, .bss
define etext
define end

So this step creates:

kernel/kernel

That is the kernel binary QEMU will load.

Third line: create disassembly

$(OBJDUMP) -S $K/kernel > $K/kernel.asm

Expands roughly to:

riscv64-unknown-elf-objdump -S kernel/kernel > kernel/kernel.asm

This does not build the kernel. It creates a human-readable file:

kernel/kernel.asm

objdump -S means:

show disassembly, mixed with source code when debug info is available

So kernel.asm lets you inspect:

C source
RISC-V assembly generated from it
function addresses
machine-level control flow

This is useful for debugging and learning.

Example use cases:

What assembly did scheduler() compile into?
Where is _entry?
What address is usertrap?
What instruction caused a crash?

Fourth line: create symbol table file

$(OBJDUMP) -t $K/kernel | sed '1,/SYMBOL TABLE/d; s/ .* / /; /^$$/d' > $K/kernel.sym

This creates:

kernel/kernel.sym

objdump -t kernel/kernel prints the symbol table.

The symbol table contains names and addresses, such as:

80000000 _entry
80001234 main
80004567 scheduler
80007890 usertrap
...

Then the sed command cleans the output.

Breaking down the symbol command

$(OBJDUMP) -t $K/kernel

means:

Print the symbol table from kernel/kernel.

Then:

sed '1,/SYMBOL TABLE/d; s/ .* / /; /^$/d'

does some text filtering.

`1,/SYMBOL TABLE/d`

Delete everything from line 1 through the line containing SYMBOL TABLE.

So it removes objdump’s header text.

`s/ .* / /`

Simplify each symbol line by removing extra fields between spaces.

Objdump symbol lines have several columns. xv6 wants a compact address/name style file.

`/^$/d`

Delete empty lines.

In the Makefile, this appears as:

/^$$/d

because in Makefiles, $ has special meaning. To pass a literal $ to the shell/sed, Make needs $$.

So:

/^$$/d

becomes this for sed:

/^$/d

Meaning:

delete blank lines

Why generate `kernel.sym`?

The symbol file maps kernel names to addresses.

This is useful when debugging.

For example, if xv6 prints or GDB shows an address:

0x80003f12

you can use the symbol table to figure out:

that address is inside usertrap()

So:

kernel/kernel.asm = detailed assembly listing
kernel/kernel.sym = compact symbol/address map

Full output of this rule

This Makefile rule produces three important files:

File	Purpose
`kernel/kernel`	Final linked xv6 kernel loaded by QEMU.
`kernel/kernel.asm`	Disassembly/source listing for inspection.
`kernel/kernel.sym`	Symbol/address table for debugging.

Big picture

kernel/entry.o
kernel/start.o
kernel/main.o
kernel/proc.o
kernel/vm.o
...
kernel/kernel.ld
        ↓ linker
kernel/kernel
        ↓ objdump -S
kernel/kernel.asm
        ↓ objdump -t + sed
kernel/kernel.sym

So this rule is the point where all compiled kernel pieces become one actual bootable kernel image.

Cmpilation

This is a pattern rule for building kernel assembly object files.

$K/%.o: $K/%.S
	$(CC) -march=rv64gc -g -c -o $@ $<

Since:

K=kernel

it means:

kernel/%.o: kernel/%.S
	$(CC) -march=rv64gc -g -c -o $@ $<

In plain English:

To build any kernel/foo.o,
if there is a matching kernel/foo.S,
compile/assemble kernel/foo.S into kernel/foo.o.

Examples:

kernel/entry.S       → kernel/entry.o
kernel/swtch.S       → kernel/swtch.o
kernel/trampoline.S  → kernel/trampoline.o
kernel/kernelvec.S   → kernel/kernelvec.o

What `%` means

% is a wildcard pattern.

So:

kernel/%.o

matches:

kernel/entry.o
kernel/swtch.o
kernel/trampoline.o

And:

kernel/%.S

means the matching source file:

kernel/entry.S
kernel/swtch.S
kernel/trampoline.S

So if Make needs kernel/swtch.o, it sees:

kernel/swtch.o: kernel/swtch.S

and runs the command.

What `$@` and `$<` mean

These are automatic Make variables.

Variable	Meaning	Example for `kernel/swtch.o`
`$@`	Target being built	`kernel/swtch.o`
`$<`	First dependency/input	`kernel/swtch.S`

So this command:

$(CC) -march=rv64gc -g -c -o $@ $<

becomes:

riscv64-unknown-elf-gcc -march=rv64gc -g -c -o kernel/swtch.o kernel/swtch.S

Why use `$(CC)` instead of `$(AS)`?

Even though this is assembly, xv6 uses gcc to build .S files.

That is normal.

There are two common assembly extensions:

Extension	Meaning
`.s`	Raw assembly, sent directly to assembler.
`.S`	Assembly that is first run through the C preprocessor.

Uppercase .S means the file can use preprocessor features like:

#include
#define
#ifdef

So GCC handles the preprocessing step, then invokes the assembler.

That is why this rule uses:

$(CC)

rather than directly using:

$(AS)

What each flag means

-march=rv64gc

Generate code for 64-bit RISC-V with common extensions.

-g

Include debug information.

-c

Compile/assemble only. Do not link.

So the output is an object file:

kernel/foo.o

not a final executable.

-o $@

Name the output file.

$<

Use the source assembly file as input.

Big picture

This rule turns low-level assembly files into object files so the linker can later combine them with C object files:

kernel/entry.S
kernel/swtch.S
kernel/trampoline.S
kernel/kernelvec.S
        ↓
assembly pattern rule
        ↓
kernel/entry.o
kernel/swtch.o
kernel/trampoline.o
kernel/kernelvec.o
        ↓
linker
        ↓
kernel/kernel

So this rule is specifically for xv6’s low-level assembly parts: boot entry, context switching, trap transition, and kernel trap vector.

Some tags shit:

This Makefile rule builds an Emacs tags file for navigating xv6 source code.

tags: $(OBJS)
	etags kernel/*.S kernel/*.c

In plain English:

To build the target named tags,
first make sure the kernel object files exist,
then run etags over kernel assembly and C files.

What is `tags`?

tags is not part of the kernel.

It is a developer convenience target.

When you run:

make tags

it generates a file usually named:

TAGS

That file indexes functions, symbols, and definitions in the source code so an editor can jump around quickly.

For example, in Emacs you can put your cursor on:

scheduler

and jump to the definition of scheduler().

What is `etags`?

etags is a source-code indexing tool used mainly by Emacs.

It scans source files and records where definitions live.

This command:

etags kernel/*.S kernel/*.c

means:

Scan all kernel assembly files and all kernel C files.
Create a TAGS file for editor navigation.

So it includes files like:

kernel/entry.S
kernel/swtch.S
kernel/trampoline.S
kernel/main.c
kernel/proc.c
kernel/vm.c
kernel/trap.c
...

Why does `tags` depend on `$(OBJS)`?

tags: $(OBJS)

This says:

Before generating tags, build the kernel object files.

Strictly speaking, etags only needs the source files, not the .o files.

So this dependency is not conceptually necessary for source indexing. It is probably there so that:

make tags

also ensures the kernel source currently builds, or so generated/intermediate files are up to date before navigation.

But the actual tag generation command only reads:

kernel/*.S
kernel/*.c

Does this affect xv6 runtime?

No.

This has nothing to do with:

booting
linking
QEMU
filesystem image
syscalls
kernel execution

It is only for developer navigation.

Compact summary

Part	Meaning
`tags`	Make target for code navigation.
`$(OBJS)`	Kernel object-file dependencies.
`etags`	Tool that generates Emacs `TAGS` file.
`kernel/.S kernel/.c`	Source files to index.
Runtime effect	None. Developer convenience only.

Mental model:

kernel source files
  ↓ etags
TAGS file
  ↓
editor can jump to definitions

So this rule is just “make it easier to browse xv6 source code.”

User Library

This whole block is about building xv6 user programs, not the kernel.

The kernel becomes:

kernel/kernel

User programs become files like:

user/_sh
user/_ls
user/_cat
user/_init

Those _-prefixed binaries are later packed into fs.img by mkfs.

1. User-space mini library

ULIB = $U/ulib.o $U/usys.o $U/printf.o $U/umalloc.o

Since:

U=user

this means:

ULIB = user/ulib.o user/usys.o user/printf.o user/umalloc.o

This is xv6’s tiny user-space runtime library.

Object file	Source	Purpose
`user/ulib.o`	`user/ulib.c`	Basic user helpers like string functions and wrappers.
`user/usys.o`	generated from `user/usys.S`	Syscall stubs that execute `ecall`.
`user/printf.o`	`user/printf.c`	User-space `printf`.
`user/umalloc.o`	`user/umalloc.c`	User-space `malloc`/`free`.

Why does every user program need this?

Because xv6 user programs do not link against normal libc.

So a program like ls needs xv6’s own tiny support code:

user/ls.o
  + user/ulib.o
  + user/usys.o
  + user/printf.o
  + user/umalloc.o
  ↓
user/_ls

2. Generic rule for building user programs

_%: %.o $(ULIB) $U/user.ld
	$(LD) $(LDFLAGS) -T $U/user.ld -o $@ $< $(ULIB)
	$(OBJDUMP) -S $@ > $*.asm
	$(OBJDUMP) -t $@ | sed '1,/SYMBOL TABLE/d; s/ .* / /; /^$$/d' > $*.sym

This is the main rule for linking user programs.

What does `_%: %.o ...` mean?

This is a pattern rule.

It says:

To build _something,
use something.o plus the user library.

Examples:

user/_ls    from user/ls.o
user/_cat   from user/cat.o
user/_sh    from user/sh.o
user/_init  from user/init.o

The leading underscore is important.

On your host machine, there is already a real ls, cat, sh, etc. So xv6’s compiled user binaries are named:

user/_ls
user/_cat
user/_sh

Then mkfs puts them into the xv6 filesystem without the leading underscore, so inside xv6 they appear as:

ls
cat
sh

Dependencies

_%: %.o $(ULIB) $U/user.ld

To build a user program, Make needs:

program object file
user mini-library objects
user linker script

For example:

user/_ls depends on:
  user/ls.o
  user/ulib.o
  user/usys.o
  user/printf.o
  user/umalloc.o
  user/user.ld

Linking command

$(LD) $(LDFLAGS) -T $U/user.ld -o $@ $< $(ULIB)

For user/_ls, this becomes roughly:

riscv64-unknown-elf-ld \
  -z max-page-size=4096 \
  -T user/user.ld \
  -o user/_ls \
  user/ls.o \
  user/ulib.o user/usys.o user/printf.o user/umalloc.o

Meaning:

Link ls.o with the xv6 user library
using user/user.ld
and produce user/_ls.

What are `$@`, `$<`, and `$*`?

Make variable	Meaning	Example for `user/_ls`
`$@`	Target being built	`user/_ls`
`$<`	First dependency	`user/ls.o`
`$*`	Stem matched by `%`	`user/ls`

So:

-o $@

means:

output to user/_ls

and:

$<

means:

use user/ls.o as the main object

Generate user program disassembly

$(OBJDUMP) -S $@ > $*.asm

For user/_ls, this becomes:

riscv64-unknown-elf-objdump -S user/_ls > user/ls.asm

It creates a human-readable disassembly/source file.

So:

user/_ls
  ↓
user/ls.asm

This is useful if you want to inspect how ls.c compiled into RISC-V assembly.

Generate user program symbol file

$(OBJDUMP) -t $@ | sed '1,/SYMBOL TABLE/d; s/ .* / /; /^$$/d' > $*.sym

For user/_ls, this becomes:

riscv64-unknown-elf-objdump -t user/_ls \
  | sed '1,/SYMBOL TABLE/d; s/ .* / /; /^$/d' \
  > user/ls.sym

It creates:

user/ls.sym

That file maps symbols to addresses.

Example conceptually:

00000000 main
000000a4 printf
00000120 write

Again, this is for debugging/inspection.

3. Generate `user/usys.S`

$U/usys.S : $U/usys.pl
	perl $U/usys.pl > $U/usys.S

Expands to:

user/usys.S : user/usys.pl
	perl user/usys.pl > user/usys.S

This says:

Generate user/usys.S from user/usys.pl.

usys.pl is a Perl script that prints assembly code.

That generated assembly contains syscall wrappers.

For example, conceptually it creates functions like:

fork:
  li a7, SYS_fork
  ecall
  ret
 
write:
  li a7, SYS_write
  ecall
  ret
 
exit:
  li a7, SYS_exit
  ecall
  ret

The actual syscall calling convention is:

arguments go in registers like a0, a1, a2...
syscall number goes in a7
ecall enters the kernel
return value comes back in a0

So if user code does:

write(1, "hi\n", 3);

it calls the generated write stub in usys.S.

Then:

write() wrapper
  ↓
load syscall number into a7
  ↓
ecall
  ↓
kernel trap path

4. Compile `user/usys.S` into `user/usys.o`

$U/usys.o : $U/usys.S
	$(CC) $(CFLAGS) -c -o $U/usys.o $U/usys.S

Expands to:

user/usys.o : user/usys.S
	$(CC) $(CFLAGS) -c -o user/usys.o user/usys.S

This compiles/assembles the generated syscall stubs.

Flow:

user/usys.pl
  ↓ Perl generates
user/usys.S
  ↓ compiler/assembler
user/usys.o
  ↓ linked into every user program

This object is part of ULIB, so every xv6 user program gets syscall wrappers.

5. Special rule for `forktest`

$U/_forktest: $U/forktest.o $(ULIB)
	# forktest has less library code linked in - needs to be small
	# in order to be able to max out the proc table.
	$(LD) $(LDFLAGS) -N -e main -Ttext 0 -o $U/_forktest $U/forktest.o $U/ulib.o $U/usys.o
	$(OBJDUMP) -S $U/_forktest > $U/forktest.asm

This is a special case.

Normally, user programs link with all of ULIB:

ulib.o
usys.o
printf.o
umalloc.o

But forktest links only:

forktest.o
ulib.o
usys.o

It intentionally omits:

printf.o
umalloc.o

Why?

Because forktest is designed to stress the process table by creating as many processes as possible.

If the program is too large, each process consumes more memory. Then the test may run out of memory before it actually maxes out the process table.

So xv6 keeps forktest tiny.

Special linker flags for `forktest`

-N -e main -Ttext 0

Flag	Meaning
`-N`	Make text/data more simply laid out and writable/readable; avoid page alignment overhead.
`-e main`	Set entry point to `main`.
`-Ttext 0`	Start text/code at virtual address 0.

This creates a very small/simple executable.

The output is:

user/_forktest

Then:

$(OBJDUMP) -S $U/_forktest > $U/forktest.asm

creates:

user/forktest.asm

Complete user build flow

For a normal user program like ls:

user/ls.c
  ↓ compile
user/ls.o
 
user/usys.pl
  ↓ Perl
user/usys.S
  ↓ compile
user/usys.o
 
user/ulib.c      → user/ulib.o
user/printf.c    → user/printf.o
user/umalloc.c   → user/umalloc.o
 
user/ls.o + ULIB + user/user.ld
  ↓ link
user/_ls
  ↓ objdump
user/ls.asm
user/ls.sym

For forktest:

user/forktest.o + user/ulib.o + user/usys.o
  ↓ special tiny link
user/_forktest

Big idea

The kernel build and user build are parallel but separate:

kernel/*.c, kernel/*.S
  ↓
kernel/*.o
  ↓
kernel/kernel
 
user/*.c, generated user/usys.S
  ↓
user/*.o
  ↓
user/_init, user/_sh, user/_ls, ...
  ↓
mkfs packs them into fs.img

This Makefile block is the part that turns xv6 user-space source code into RISC-V executables that the xv6 kernel can later load with exec.

mkfs

mkfs/mkfs: mkfs/mkfs.c $K/fs.h $K/param.h
	gcc -Wno-unknown-attributes -I. -o mkfs/mkfs mkfs/mkfs.c

This rule builds the mkfs host tool.

mkfs is not part of the xv6 kernel and it is not an xv6 user program. It runs on your real machine during the build.

Its job is:

compiled xv6 user programs
  ↓
mkfs
  ↓
fs.img

Then QEMU gives fs.img to xv6 as its virtual disk.

Rule header

mkfs/mkfs: mkfs/mkfs.c $K/fs.h $K/param.h

This means:

To build mkfs/mkfs, Make needs:
  mkfs/mkfs.c
  kernel/fs.h
  kernel/param.h

Since:

K=kernel

this expands to:

mkfs/mkfs: mkfs/mkfs.c kernel/fs.h kernel/param.h

So if any of these change, mkfs/mkfs must be rebuilt.

Why does `mkfs` depend on `kernel/fs.h`?

Because mkfs must create a disk image in exactly the format the xv6 kernel understands.

kernel/fs.h defines the on-disk filesystem format:

block size
superblock layout
inode layout
directory entry format
bitmap math
filesystem constants

So both sides must agree:

mkfs/mkfs.c
  writes fs.img using fs.h layout
 
kernel/fs.c
  reads fs.img using fs.h layout

If fs.h changes, the format may change, so mkfs needs rebuilding.

Why does `mkfs` depend on `kernel/param.h`?

Because filesystem sizes and constants can depend on global xv6 parameters.

For example, things like filesystem size, log size, inode counts, or related constants may come from param.h.

So:

param.h changes
  ↓
filesystem constants may change
  ↓
mkfs must rebuild

Build command

gcc -Wno-unknown-attributes -I. -o mkfs/mkfs mkfs/mkfs.c

Notice this uses plain:

gcc

not:

$(CC)

That is intentional.

mkfs runs on the host machine, so it must be compiled with the host compiler.

mkfs/mkfs.c
  ↓ host gcc
mkfs/mkfs
  ↓ runs on your laptop
creates fs.img

If your laptop is x86-64, mkfs/mkfs is an x86-64 program.

If your laptop is ARM, it is an ARM program.

But the user binaries it packs into fs.img are RISC-V binaries.

Why not use the RISC-V compiler?

Because then mkfs/mkfs would become a RISC-V executable, and your host machine could not directly run it during the build.

Wrong mental model:

mkfs should run inside xv6

Correct mental model:

mkfs runs before xv6 boots
mkfs creates the disk image xv6 will later read

So:

kernel/user programs → RISC-V compiler
mkfs tool            → host compiler

`-Wno-unknown-attributes`

-Wno-unknown-attributes

Suppress warnings about compiler attributes the host compiler may not recognize.

This keeps mkfs building cleanly across different host compilers/toolchains.

`-I.`

-I.

Add the current project root as an include path.

This lets mkfs/mkfs.c include headers like:

#include "kernel/fs.h"
#include "kernel/param.h"

or similar project-relative headers.

`-o mkfs/mkfs`

-o mkfs/mkfs

Name the output executable:

mkfs/mkfs

So the build creates a host executable at that path.

`.PRECIOUS`

# Prevent deletion of intermediate files, e.g. cat.o, after first build, so
# that disk image changes after first build are persistent until clean.
.PRECIOUS: %.o

This is a Make behavior rule.

It tells Make:

Do not automatically delete .o intermediate files.

What are intermediate files?

Sometimes Make builds a target through chained implicit rules.

For example:

user/cat.c
  ↓
user/cat.o
  ↓
user/_cat

If Make considers user/cat.o only an intermediate file, it may delete it after building user/_cat.

.PRECIOUS: %.o says:

Keep .o files around.

Why does xv6 care?

Because the disk image depends on compiled user programs.

The build path is:

user/cat.c
  ↓
user/cat.o
  ↓
user/_cat
  ↓
fs.img

If intermediate files are deleted weirdly, Make’s dependency tracking can behave in surprising ways on later builds.

The comment says this helps make disk image changes after the first build persistent until make clean.

In plain English:

Keep object files around so incremental builds behave predictably.
Only remove them when the user explicitly runs make clean.

Why `.PRECIOUS` specifically?

In GNU Make, .PRECIOUS has two effects:

1. Do not delete the target if the build is interrupted.
2. Do not delete it automatically if Make thinks it is intermediate.

Here xv6 mainly cares about the second effect.

So:

.PRECIOUS: %.o

means:

All .o files are precious.
Do not auto-delete them.

Compact summary

mkfs/mkfs rule:
  builds the host-side filesystem image creator
 
uses plain gcc:
  because mkfs runs on your real machine
 
depends on fs.h:
  because mkfs must write the same filesystem format the kernel reads
 
depends on param.h:
  because filesystem/kernel constants may affect the image layout
 
.PRECIOUS: %.o:
  tells Make to keep object files around for stable incremental builds

Big picture:

RISC-V compiler:
  kernel/kernel
  user/_init
  user/_sh
  user/_cat
  ...
 
host gcc:
  mkfs/mkfs
 
mkfs/mkfs:
  reads user/_init, user/_sh, user/_cat, ...
  writes fs.img
 
QEMU:
  boots kernel/kernel
  attaches fs.img as disk

User programs

This block defines which xv6 user programs get built and inserted into the filesystem image.

UPROGS=\
	$U/_cat\
	$U/_echo\
	$U/_forktest\
	$U/_grep\
	$U/_init\
	$U/_kill\
	$U/_ln\
	$U/_ls\
	$U/_mkdir\
	$U/_rm\
	$U/_sh\
	$U/_stressfs\
	$U/_usertests\
	$U/_grind\
	$U/_wc\
	$U/_zombie\
	$U/_logstress\
	$U/_forphan\
	$U/_dorphan\

Since:

U=user

this expands conceptually to:

user/_cat
user/_echo
user/_forktest
user/_grep
user/_init
...

These are compiled xv6 user binaries.

Why the leading underscore?

On your host machine, names like cat, echo, grep, ls, mkdir, rm, and sh already exist as normal Unix/Linux commands.

So xv6 names its compiled user binaries with a leading underscore on the host:

user/_ls
user/_cat
user/_sh

But inside xv6, they appear without the underscore:

ls
cat
sh

So:

host filename:    user/_ls
inside xv6 file:  /ls

mkfs handles this convention when it writes the files into fs.img.

What is `UPROGS`?

UPROGS means:

user programs to include in the xv6 filesystem image

This list is not just “programs to compile.” It is specifically the list of programs that should exist inside the xv6 disk image.

So if you write a new user program:

user/hello.c

you usually add:

$U/_hello\

to UPROGS.

Then the build can produce:

user/_hello

and mkfs will put it into fs.img.

What each program is

Program	Source file	Purpose
`$U/_cat`	`user/cat.c`	Print file contents.
`$U/_echo`	`user/echo.c`	Print command-line arguments.
`$U/_forktest`	`user/forktest.c`	Stress-test `fork` and process table size.
`$U/_grep`	`user/grep.c`	Search text for matching patterns.
`$U/_init`	`user/init.c`	First user process started by the kernel.
`$U/_kill`	`user/kill.c`	Request killing a process by PID.
`$U/_ln`	`user/ln.c`	Create a hard link.
`$U/_ls`	`user/ls.c`	List directory contents.
`$U/_mkdir`	`user/mkdir.c`	Create a directory.
`$U/_rm`	`user/rm.c`	Remove a file.
`$U/_sh`	`user/sh.c`	xv6 shell.
`$U/_stressfs`	`user/stressfs.c`	Stress-test filesystem writes.
`$U/_usertests`	`user/usertests.c`	Large user/kernel behavior test suite.
`$U/_grind`	`user/grind.c`	Stress-test process/filesystem/syscall interactions.
`$U/_wc`	`user/wc.c`	Count lines, words, and bytes.
`$U/_zombie`	`user/zombie.c`	Demonstrate zombie process behavior.
`$U/_logstress`	`user/logstress.c`	Stress-test filesystem logging.
`$U/_forphan`	`user/forphan.c`	Test orphaned process behavior.
`$U/_dorphan`	`user/dorphan.c`	Helper/test related to orphaned processes.

The `fs.img` rule

fs.img: mkfs/mkfs README $(UPROGS)
	mkfs/mkfs fs.img README $(UPROGS)

This rule creates the xv6 filesystem image.

Rule header

fs.img: mkfs/mkfs README $(UPROGS)

This means:

To build fs.img, Make needs:
  mkfs/mkfs
  README
  all user programs in UPROGS

So fs.img gets rebuilt if any of these change:

mkfs/mkfs changes
README changes
user/_cat changes
user/_sh changes
user/_init changes
...

Build command

mkfs/mkfs fs.img README $(UPROGS)

This runs the host-side mkfs tool.

Expanded conceptually:

mkfs/mkfs fs.img README user/_cat user/_echo user/_forktest user/_grep user/_init ...

Meaning:

Create fs.img
and put README plus all UPROGS into it.

So the output is:

fs.img

That file is a raw xv6 filesystem image.

Build flow

For a user program like ls:

user/ls.c
  ↓ compile
user/ls.o
  ↓ link with ULIB
user/_ls
  ↓ mkfs
/ls inside fs.img

For the whole image:

README
user/_cat
user/_echo
user/_forktest
user/_grep
user/_init
user/_kill
user/_ln
user/_ls
user/_mkdir
user/_rm
user/_sh
user/_stressfs
user/_usertests
user/_grind
user/_wc
user/_zombie
user/_logstress
user/_forphan
user/_dorphan
        ↓
mkfs/mkfs
        ↓
fs.img

Why `fs.img` must include `_init`

This one is critical.

The kernel eventually tries to start the first user process:

/init

That file comes from:

user/_init

inside fs.img.

If _init is missing from UPROGS, xv6 may boot the kernel but fail to start normal user space.

Runtime path:

kernel boots
  ↓
kernel initializes filesystem
  ↓
kernel tries to execute /init
  ↓
/init starts /sh
  ↓
you get a shell

So _init and _sh are especially important.

Important distinction

There are three kinds of files here:

user/*.c
  source code for xv6 user programs
 
user/_*
  compiled RISC-V user binaries on the host filesystem
 
fs.img
  xv6 filesystem image containing those binaries as files

So:

user/sh.c
  ↓
user/_sh
  ↓
fs.img contains /sh
  ↓
xv6 runs /sh

Compact summary

UPROGS:
  list of compiled xv6 user binaries to include in fs.img
 
fs.img rule:
  runs mkfs to create the xv6 disk image
 
mkfs/mkfs:
  host program that writes fs.img
 
fs.img:
  virtual disk used by xv6 inside QEMU

Big picture:

user programs are built first
        ↓
mkfs packs them into fs.img
        ↓
QEMU gives fs.img to xv6 as a disk
        ↓
xv6 loads /init from fs.img
        ↓
init starts sh

Next step

This block has two parts:

-include kernel/*.d user/*.d

and:

clean:
	rm -f ...

The first part helps incremental rebuilds. The second part deletes generated build artifacts.

`-include kernel/.d user/.d`

-include kernel/*.d user/*.d

This tells Make:

Include dependency files generated by the compiler,
but do not complain if they do not exist yet.

Earlier, xv6 uses this compiler flag:

-MD

That makes GCC generate .d files alongside .o files.

For example:

kernel/proc.c
  ↓ compile
kernel/proc.o
kernel/proc.d

The .o file is the compiled object file.

The .d file records header dependencies, something like:

kernel/proc.o: kernel/proc.c kernel/types.h kernel/param.h kernel/proc.h kernel/riscv.h

So Make learns:

If proc.h changes, rebuild proc.o.
If riscv.h changes, rebuild proc.o.
If types.h changes, rebuild proc.o.

Without .d files, Make might only know:

proc.o depends on proc.c

and miss the fact that changing a header should trigger a rebuild.

Why the leading `-`?

This:

-include

is different from:

include

The leading - means:

Try to include these files.
If they do not exist, ignore the error.

That matters on the first build.

Before compilation, there may be no files like:

kernel/proc.d
kernel/vm.d
user/sh.d

So plain include could fail.

But -include says:

No dependency files yet? Fine. Continue.

After the first build, the .d files exist and Make uses them for smarter incremental rebuilds.

Dependency file flow

kernel/proc.c
  ↓ compile with -MD
kernel/proc.o
kernel/proc.d
  ↓
Make includes proc.d next time
  ↓
Make knows which headers proc.o depends on

So this line is for correctness and convenience during repeated builds.

`clean`

clean:
	rm -f *.tex *.dvi *.idx *.aux *.log *.ind *.ilg \
	*/*.o */*.d */*.asm */*.sym \
	$K/kernel fs.img \
	mkfs/mkfs .gdbinit \
        $U/usys.S \
	$(UPROGS)

This defines the make clean target.

When you run:

make clean

Make runs the rm -f ... command and removes generated files.

It resets the tree close to a fresh source state.

`rm -f`

rm -f

means:

remove files if they exist;
do not error if they do not exist

So make clean can be run repeatedly without failing just because some files are already gone.

Documentation artifacts

*.tex *.dvi *.idx *.aux *.log *.ind *.ilg

These are LaTeX/documentation build artifacts.

They are not central to the kernel itself.

They come from building docs/book-related material.

Object, dependency, assembly, and symbol files

*/*.o */*.d */*.asm */*.sym

This removes generated files in subdirectories.

Pattern	Removes	Meaning
`/.o`	object files	Compiled C/assembly outputs.
`/.d`	dependency files	Header dependency files from `-MD`.
`/.asm`	disassembly files	Generated by `objdump -S`.
`/.sym`	symbol files	Generated by `objdump -t`.

Examples removed:

kernel/proc.o
kernel/proc.d
kernel/kernel.asm
kernel/kernel.sym
user/sh.o
user/sh.d
user/sh.asm
user/sh.sym

These can all be regenerated.

Kernel and filesystem image

$K/kernel fs.img

Since:

K=kernel

this removes:

kernel/kernel
fs.img

Meaning:

File	Meaning
`kernel/kernel`	Final linked xv6 kernel.
`fs.img`	xv6 filesystem disk image.

After deleting these, the next make qemu must relink the kernel and recreate the disk image.

Host-side mkfs and GDB config

mkfs/mkfs .gdbinit

This removes:

File	Meaning
`mkfs/mkfs`	Host executable that creates `fs.img`.
`.gdbinit`	Generated GDB configuration file.

mkfs/mkfs is rebuilt from mkfs/mkfs.c.

.gdbinit is regenerated when using the GDB-related target.

Generated syscall assembly

$U/usys.S

Since:

U=user

this removes:

user/usys.S

This file is generated from:

user/usys.pl

So it is not source-of-truth. It can be regenerated.

Flow:

user/usys.pl
  ↓
user/usys.S
  ↓
user/usys.o

make clean deletes the generated assembly so it can be recreated fresh.

User programs

$(UPROGS)

This removes all compiled xv6 user binaries listed in UPROGS.

Examples:

user/_cat
user/_echo
user/_forktest
user/_grep
user/_init
user/_kill
user/_ln
user/_ls
user/_mkdir
user/_rm
user/_sh
...

These are RISC-V executables built from user/*.c.

They are later packed into fs.img.

What remains after `make clean`?

The source files remain:

kernel/*.c
kernel/*.S
kernel/*.h
user/*.c
user/*.h
mkfs/mkfs.c
Makefile

The generated files disappear:

*.o
*.d
*.asm
*.sym
kernel/kernel
fs.img
mkfs/mkfs
user/usys.S
user/_*

So after:

make clean

the next build starts fresh.

Big picture

-include kernel/*.d user/*.d

means:

Use compiler-generated dependency files
so header changes trigger correct rebuilds.

clean

means:

Delete generated files:
object files,
dependency files,
debug listings,
kernel binary,
filesystem image,
mkfs executable,
generated syscall assembly,
compiled user programs.

Together:

.d files make incremental builds smarter.
clean removes all generated state when you want a fresh rebuild.

QEMU flags

This block is the QEMU run/debug section of the Makefile.

It handles:

normal boot:       make qemu
debug boot:        make qemu-gdb
GDB port setup:    choose a unique port
QEMU options:      define fake RISC-V hardware
version check:     require a new enough QEMU

GDB port generation

# try to generate a unique GDB port
GDBPORT = $(shell expr `id -u` % 5000 + 25000)

This creates a semi-unique TCP port for GDB.

Breakdown:

id -u

gets your numeric user ID.

Then:

user_id % 5000 + 25000

creates a port somewhere between:

25000 and 29999

Why?

Because if many users on the same machine run xv6 debugging, they should not all try to use the same GDB port.

Example:

user id = 1001
1001 % 5000 + 25000 = 26001

So GDB would connect to port 26001.

QEMU GDB stub option

# QEMU's gdb stub command line changed in 0.11
QEMUGDB = $(shell if $(QEMU) -help | grep -q '^-gdb'; \
	then echo "-gdb tcp::$(GDBPORT)"; \
	else echo "-s -p $(GDBPORT)"; fi)

QEMU has a built-in GDB stub.

That means QEMU can pause the virtual CPU and let GDB connect to it.

This block checks which QEMU command-line syntax is supported.

If QEMU supports:

-gdb

then use:

-gdb tcp::<port>

Otherwise use the older style:

-s -p <port>

So this is compatibility logic.

Conceptually:

Ask QEMU: do you support the modern -gdb option?
  yes → use -gdb tcp::<port>
  no  → use old -s -p <port>

CPU count

ifndef CPUS
CPUS := 3
endif

This means:

If CPUS was not already set, use 3.

So by default xv6 runs with 3 simulated RISC-V CPUs/harts.

You can override it:

make qemu CPUS=1

or:

make qemu CPUS=4

Default:

CPUS = 3

This matters because xv6 is a multiprocessor kernel. Locks, scheduling, interrupts, and per-CPU state are real concerns.

QEMU machine options

QEMUOPTS = -machine virt -bios none -kernel $K/kernel -m 128M -smp $(CPUS) -nographic

This defines the core QEMU command-line options.

Expanded conceptually:

qemu-system-riscv64 \
  -machine virt \
  -bios none \
  -kernel kernel/kernel \
  -m 128M \
  -smp 3 \
  -nographic

Option	Meaning
`-machine virt`	Use QEMU’s generic RISC-V virtual machine.
`-bios none`	Do not run firmware; jump directly to kernel.
`-kernel kernel/kernel`	Load xv6 kernel binary.
`-m 128M`	Give the virtual machine 128 MiB RAM.
`-smp $(CPUS)`	Simulate multiple CPUs/harts. Default is 3.
`-nographic`	No GUI; use terminal for serial console.

So QEMU creates a fake machine like:

64-bit RISC-V virt machine
128 MiB RAM
3 harts by default
serial console in terminal
xv6 kernel loaded directly

Virtio compatibility option

QEMUOPTS += -global virtio-mmio.force-legacy=false

This tells QEMU:

Use non-legacy virtio MMIO behavior.

xv6 talks to the disk through a virtio block device. This option makes QEMU expose the device in the mode xv6 expects.

You do not need to deeply understand this at first. It is basically:

Make QEMU's virtio disk interface match xv6's driver.

Attach `fs.img` as a virtual disk

QEMUOPTS += -drive file=fs.img,if=none,format=raw,id=x0

This defines a raw disk backend.

Meaning:

Use fs.img as a disk image.
Do not automatically attach it to a bus yet.
Give it ID x0.

Breakdown:

Part	Meaning
`file=fs.img`	Host file used as disk contents.
`if=none`	Create backend only; do not auto-create device.
`format=raw`	Treat file as raw disk bytes.
`id=x0`	Name this drive backend `x0`.

Then:

QEMUOPTS += -device virtio-blk-device,drive=x0,bus=virtio-mmio-bus.0

attaches that backend as a virtio block device.

Meaning:

Create a virtio block device using drive x0.
Attach it to QEMU's virtio MMIO bus.

Together:

fs.img host file
  ↓
QEMU drive backend x0
  ↓
virtio block device
  ↓
xv6 virtio_disk driver
  ↓
xv6 filesystem

Inside xv6, this looks like a disk.

On your host, it is just the file:

fs.img

Normal QEMU target

qemu: check-qemu-version $K/kernel fs.img
	$(QEMU) $(QEMUOPTS)

This defines:

make qemu

Dependencies:

check-qemu-version
kernel/kernel
fs.img

So before QEMU starts, Make ensures:

QEMU version is new enough
kernel is built
filesystem image is built

Then it runs:

$(QEMU) $(QEMUOPTS)

Conceptually:

build kernel
build fs.img
start fake RISC-V machine
load kernel
attach fs.img
boot xv6

Generate `.gdbinit`

.gdbinit: .gdbinit.tmpl-riscv
	sed "s/:1234/:$(GDBPORT)/" < $^ > $@

This generates a local .gdbinit file from the template.

$^ means:

all dependencies

Here that is:

.gdbinit.tmpl-riscv

$@ means:

target being built

Here that is:

.gdbinit

So the command is roughly:

sed "s/:1234/:26001/" < .gdbinit.tmpl-riscv > .gdbinit

It replaces the default GDB port 1234 with your generated GDBPORT.

Why?

Because QEMU’s GDB stub listens on that port, and GDB needs to connect to the same port.

Debug QEMU target

qemu-gdb: $K/kernel .gdbinit fs.img
	@echo "*** Now run 'gdb' in another window." 1>&2
	$(QEMU) $(QEMUOPTS) -S $(QEMUGDB)

This defines:

make qemu-gdb

Dependencies:

kernel/kernel
.gdbinit
fs.img

Then it prints:

*** Now run 'gdb' in another window.

The @ suppresses echoing the command itself.

1>&2 sends the message to stderr.

Then it starts QEMU with:

-S

and the GDB stub option.

What does `-S` mean?

Start QEMU with the CPU stopped.

So QEMU loads the machine but does not begin executing instructions until GDB tells it to continue.

Debug flow:

Terminal 1:
  make qemu-gdb
 
QEMU starts paused and waits for GDB.
 
Terminal 2:
  gdb
 
GDB reads .gdbinit,
connects to QEMU,
sets breakpoints,
then you continue execution.

This lets you debug from the very first instruction.

Print GDB port

print-gdbport:
	@echo $(GDBPORT)

This target just prints the port.

Example:

make print-gdbport

Output:

Useful if you need to manually connect GDB.

QEMU version detection

QEMU_VERSION := $(shell $(QEMU) --version | head -n 1 | sed -E 's/^QEMU emulator version ([0-9]+\.[0-9]+)\..*/\1/')

This extracts QEMU’s major/minor version.

Example QEMU output:

QEMU emulator version 8.2.1

The command extracts:

8.2

Breakdown:

$(QEMU) --version

prints QEMU version.

head -n 1

keeps first line.

sed -E 's/^QEMU emulator version ([0-9]+\.[0-9]+)\..*/\1/'

extracts the major.minor part.

So:

QEMU emulator version 8.2.1

becomes:

8.2

QEMU version check

check-qemu-version:
	@if [ "$(shell echo "$(QEMU_VERSION) >= $(MIN_QEMU_VERSION)" | bc)" -eq 0 ]; then \
		echo "ERROR: Need qemu version >= $(MIN_QEMU_VERSION)"; \
		exit 1; \
	fi

This target checks:

Is QEMU_VERSION >= MIN_QEMU_VERSION?

It uses bc, a command-line calculator, to compare versions.

If the result is 0, meaning false, it prints an error and exits.

Conceptually:

if QEMU is too old:
  print error
  stop build/run
else:
  continue

MIN_QEMU_VERSION is defined elsewhere in the Makefile, commonly as something like:

MIN_QEMU_VERSION = 7.2

The point is:

xv6 expects certain QEMU behavior.
Old QEMU versions may not emulate the needed RISC-V/virtio features correctly.

Full normal run flow

make qemu
  ↓
check QEMU version
  ↓
build kernel/kernel
  ↓
build fs.img
  ↓
start qemu-system-riscv64
  ↓
QEMU creates RISC-V virt machine
  ↓
loads kernel/kernel
  ↓
attaches fs.img as virtio disk
  ↓
xv6 boots

Full debug run flow

make qemu-gdb
  ↓
build kernel/kernel
  ↓
build fs.img
  ↓
generate .gdbinit with unique port
  ↓
start QEMU paused
  ↓
QEMU opens GDB stub port
  ↓
run gdb in another terminal
  ↓
GDB connects to QEMU
  ↓
debug xv6 from early boot

Compact summary

Makefile piece	Purpose
`GDBPORT`	Pick a semi-unique TCP port for GDB.
`QEMUGDB`	Choose correct QEMU GDB-stub syntax.
`CPUS := 3`	Default to 3 simulated RISC-V harts.
`QEMUOPTS`	Define fake RISC-V machine hardware.
`-machine virt`	QEMU generic RISC-V board.
`-bios none`	Jump directly to xv6 kernel.
`-kernel kernel/kernel`	Load xv6 kernel.
`-m 128M`	Give xv6 128 MiB RAM.
`-smp $(CPUS)`	Use multiple CPUs/harts.
`-nographic`	Terminal-only console.
`-drive file=fs.img...`	Use `fs.img` as disk backend.
`-device virtio-blk-device...`	Attach disk as virtio block device.
`qemu`	Build and boot xv6 normally.
`qemu-gdb`	Build and boot xv6 paused for GDB.
`.gdbinit`	Generated GDB connection config.
`check-qemu-version`	Refuse to run with too-old QEMU.

Hwardware

Here’s the table of the RISC-V hardware platform QEMU creates from these options:

QEMUOPTS = -machine virt -bios none -kernel $K/kernel -m 128M -smp $(CPUS) -nographic
QEMUOPTS += -global virtio-mmio.force-legacy=false
QEMUOPTS += -drive file=fs.img,if=none,format=raw,id=x0
QEMUOPTS += -device virtio-blk-device,drive=x0,bus=virtio-mmio-bus.0

QEMU option	Hardware feature created	What xv6 sees	Why it matters
`qemu-system-riscv64`	64-bit RISC-V machine	A 64-bit RISC-V CPU platform	xv6-riscv is compiled for `rv64gc`, so it needs a 64-bit RISC-V CPU.
`-machine virt`	Generic QEMU RISC-V virtual board	A fake RISC-V computer with RAM, CPUs, UART, interrupt controller, timer, virtio devices	This is the “motherboard/platform” xv6 runs on.
`-bios none`	No firmware/BIOS layer	QEMU jumps directly to the kernel	xv6 skips firmware/bootloader complexity.
`-kernel kernel/kernel`	Kernel loaded into RAM	xv6 kernel placed at the expected boot address	This is why `kernel.ld` puts `_entry` at `0x80000000`.
`-m 128M`	128 MiB physical RAM	RAM from roughly `0x80000000` to `0x88000000`	xv6’s allocator manages this physical memory.
`-smp $(CPUS)`	Multiple RISC-V harts/cores	Default: 3 CPUs/harts	xv6 exercises locks, per-CPU state, and multiprocessor scheduling.
`-nographic`	Serial console only, no GUI	Console I/O goes through terminal	xv6 shell appears directly in your terminal.
`-global virtio-mmio.force-legacy=false`	Modern virtio MMIO mode	Virtio disk uses non-legacy MMIO behavior	Makes QEMU’s virtio device match xv6’s driver expectations.
`-drive file=fs.img,if=none,format=raw,id=x0`	Raw disk backend	`fs.img` becomes the backing storage for a disk	This is the host file containing xv6’s filesystem.
`-device virtio-blk-device,drive=x0,bus=virtio-mmio-bus.0`	Virtio block device	xv6 sees a virtual disk device	`virtio_disk.c` uses this to read/write filesystem blocks.

Hardware xv6 effectively gets

Hardware component	Present?	xv6 file mostly responsible
64-bit RISC-V CPU	Yes	`riscv.h`, `entry.S`, `start.c`
Multiple harts/cores	Yes, default 3	`proc.c`, `spinlock.c`, `start.c`
Physical RAM	Yes, 128 MiB	`kalloc.c`, `vm.c`, `memlayout.h`
UART serial device	Yes	`uart.c`, `console.c`
External interrupt controller	Yes, PLIC	`plic.c`, `trap.c`
Timer interrupts	Yes	`start.c`, `trap.c`
Virtio MMIO bus	Yes	`virtio.h`, `virtio_disk.c`
Virtio block disk	Yes	`virtio_disk.c`, `bio.c`, `fs.c`
Graphical display	No	Not used
Keyboard device	Not directly	Terminal input comes through UART
Firmware/BIOS	No	QEMU jumps directly to xv6

Simplified memory/device map

Conceptually, QEMU gives xv6 a physical address space like this:

lower physical addresses
  ↓
device MMIO regions
  UART
  virtio disk
  PLIC interrupt controller
  timer-related registers
  reserved/platform regions
 
0x80000000
  ↓
RAM starts here
  xv6 kernel loaded here
  kernel code/data/bss
  free physical pages
  user process memory
  page tables
  kernel stacks
 
0x88000000
  end of RAM with -m 128M

Full mental model

QEMU creates:
 
RISC-V virt machine
  ├── 3 RISC-V harts
  ├── 128 MiB RAM starting at 0x80000000
  ├── UART serial console
  ├── PLIC interrupt controller
  ├── timer interrupt support
  ├── virtio MMIO bus
  └── virtio block device backed by fs.img
 
Then:
  QEMU loads kernel/kernel
  jumps to xv6 _entry
  xv6 initializes hardware
  xv6 reads fs.img as disk
  xv6 runs /init

My Knowledge Base

Explorer

01 Makefile

Warning and debugging flags

Target architecture

Dependency generation

RISC-V code model

Freestanding kernel environment

Global variable behavior

Disable compiler built-ins

-Wno-main

Include path

Stack protector check

What is stack protector?

Compact table

Compact summary

Linker Flags now

Why does the linker care about page size?

Why 4096?

What could happen without it?

What does -z mean?

Compact summary

Next Ste4p

First line: the rule header

Second line: link the kernel

Third line: create disassembly

Fourth line: create symbol table file

Breaking down the symbol command

1,/SYMBOL TABLE/d

s/ .* / /

/^$/d

Why generate kernel.sym?

Full output of this rule

Big picture

Cmpilation

What % means

What $@ and $< mean

Why use $(CC) instead of $(AS)?

What each flag means

Big picture

Some tags shit:

What is tags?

What is etags?

Why does tags depend on $(OBJS)?

Does this affect xv6 runtime?

Compact summary

User Library

1. User-space mini library

2. Generic rule for building user programs

What does _%: %.o ... mean?

Dependencies

Linking command

What are $@, $<, and $*?

Generate user program disassembly

Generate user program symbol file

3. Generate user/usys.S

4. Compile user/usys.S into user/usys.o

5. Special rule for forktest

Special linker flags for forktest

Complete user build flow

Big idea

mkfs

Rule header

Why does mkfs depend on kernel/fs.h?

Why does mkfs depend on kernel/param.h?

Build command

Why not use the RISC-V compiler?

-Wno-unknown-attributes

-I.

-o mkfs/mkfs

.PRECIOUS

What are intermediate files?

Why does xv6 care?

Why .PRECIOUS specifically?

Compact summary

User programs

Why the leading underscore?

What is UPROGS?

What each program is

The fs.img rule

`-Wno-main`

What does `-z` mean?

`1,/SYMBOL TABLE/d`

`s/ .* / /`

`/^$/d`

Why generate `kernel.sym`?

What `%` means

What `$@` and `$<` mean

Why use `$(CC)` instead of `$(AS)`?

What is `tags`?

What is `etags`?

Why does `tags` depend on `$(OBJS)`?

What does `_%: %.o ...` mean?

What are `$@`, `$<`, and `$*`?

3. Generate `user/usys.S`

4. Compile `user/usys.S` into `user/usys.o`

5. Special rule for `forktest`

Special linker flags for `forktest`

Why does `mkfs` depend on `kernel/fs.h`?

Why does `mkfs` depend on `kernel/param.h`?

`-Wno-unknown-attributes`

`-I.`

`-o mkfs/mkfs`

`.PRECIOUS`

Why `.PRECIOUS` specifically?

What is `UPROGS`?

The `fs.img` rule

Why `fs.img` must include `_init`

`-include kernel/.d user/.d`

Why the leading `-`?

`clean`

`rm -f`

What remains after `make clean`?

Attach `fs.img` as a virtual disk

Generate `.gdbinit`

What does `-S` mean?