Operating System Organization

Core Requirements and Resource Abstraction

Operating systems must satisfy three core requirements: multiplexing resources, isolating activities, and enabling controlled interaction between processes.
Cooperative time-sharing and direct hardware access by applications are insufficient for strong isolation, as they require applications to be bug-free and mutually trusting.
Hardware resources are abstracted into kernel-managed services to enforce safety and convenience.
Storage is abstracted into file systems, physical memory is abstracted into memory images via exec, and CPUs are abstracted by transparently switching context between processes.
File descriptors abstract diverse I/O details (e.g., pipes, files) and natively support interaction protocols, such as automatically generating end-of-file signals when a pipeline fails.

To safely enforce these resource abstractions without application interference, the operating system relies on physical hardware privilege boundaries.

Hardware Privilege Modes and System Calls

CPUs provide hardware execution modes to establish a hard boundary between application code and the operating system.
Machine Mode: Starts upon CPU boot, executes with full hardware privilege, and is strictly used for low-level computer configuration.
Supervisor Mode: Allows execution of privileged instructions necessary for OS operations, such as enabling interrupts or writing to page table registers. Software running in this mode is the kernel, executing in kernel space.
User Mode: Restricts execution to unprivileged instructions. Applications execute in this mode within user space.
If a user-mode application attempts a privileged instruction, the CPU suppresses the instruction and forcefully switches to supervisor mode so the kernel can terminate the application.
Applications invoke kernel services via system calls using specialized instructions (e.g., the RISC-V ecall instruction).
System calls switch the CPU to supervisor mode at a strictly kernel-defined entry point, preventing malicious applications from bypassing argument validation or access control checks.

The structural placement of operating system services relative to this supervisor-mode boundary defines the overarching architectural organization of the kernel.

Kernel Architecture Models

Monolithic Kernel: The entirety of the operating system resides within the kernel and executes in supervisor mode.
- Subsystems (e.g., file systems, virtual memory) are tightly integrated, allowing them to share data structures like buffer caches efficiently.
- Internal interfaces are complex; a single programming error in supervisor mode typically causes a fatal failure of the entire system.
- xv6 and Linux utilize a monolithic kernel structure.
Microkernel: The kernel is minimized to only low-level functions (e.g., hardware access, message passing), while the bulk of the OS runs as user-level processes called servers.
- Applications request services (like file system operations) by passing messages to these servers via the kernel’s inter-process communication (IPC) mechanism.
- This limits the amount of code executing with hardware privileges, reducing the risk of catastrophic system crashes.
- Minix, L4, and QNX utilize a microkernel structure.

Regardless of whether the kernel is monolithic or microkernel, the primary mechanism it manages to isolate user applications is the process.

The Process Abstraction

A process is the fundamental unit of isolation, shielding an application’s memory, CPU state, and file descriptors from interference by other processes.
A process bundles two foundational architectural illusions:
- Private Address Space: Simulates private physical memory using hardware page tables.
  - RISC-V page tables translate virtual addresses utilized by instructions into physical addresses on the RAM chip.
  - The layout begins at virtual address zero with instructions, global variables, the stack, and the heap.
  - The address space is bounded by hardware translation limits; xv6 uses 38 bits of addressable space, establishing a maximum virtual address of $2^{38} - 1$ (MAXVA).
  - The top pages of the address space are reserved for a trampoline page (managing user/kernel transitions) and a trapframe page (saving user state).
- Private CPU (Thread): Simulates dedicated processor execution.
  - Each process contains a thread of execution that tracks local variables and return addresses on stacks.
  - A process actively alternates between two stacks: a user stack for user-space computation, and a kernel stack used exclusively during system calls and interrupts.
  - The kernel stack is protected from user-space access to ensure the kernel can execute safely even if the user stack is compromised.
Kernel state for each process is centralized in a proc structure, containing references to the process’s page table (p->pagetable), kernel stack (p->kstack), and run state (p->state).
During a system call, hardware elevates the privilege level, switches the program counter to the kernel entry point, executes on the kernel stack, and subsequently utilizes the sret instruction to lower privileges and resume the user thread.

Initializing these memory layouts, structures, and privilege modes requires a highly specific hardware boot sequence when the machine powers on.

System Initialization and the First Process

Boot Sequence:
- A boot loader loads the kernel into physical RAM at 0x80000000, placing it above the address range (0x0 to 0x80000000) reserved for memory-mapped I/O devices.
- The CPU begins in machine mode with virtual address paging disabled.
- Assembly instructions at _entry allocate an initial stack (stack0) to support C code execution and call start.
- The start function configures machine-mode settings, sets up timer interrupts, delegates exceptions to supervisor mode, and utilizes the mret instruction to cleanly force a transition into supervisor mode at the main function.
Process Creation:
- main initializes OS devices and subsystems, then explicitly calls userinit to construct the very first process.
- This initial process runs a minimal assembly program (initcode.S) to execute the exec system call.
- The kernel handles the system call by replacing the process memory with the /init binary.
- The /init process opens console file descriptors (0, 1, 2) and launches a shell, yielding a fully operational system.

Once initialization is complete, the kernel transitions into a defensive posture to continuously protect this operational state from compromised user execution.

Security Model

The kernel operates under the strict assumption that user-level code is malicious and will actively attempt to subvert isolation.
Anticipated user-space attacks include dereferencing out-of-bounds pointers, attempting to execute machine-level RISC-V instructions, directly manipulating hardware control registers, and feeding malformed values into system calls.
The kernel’s security objective is to absolutely restrict a process to reading, writing, and executing solely within its allocated user memory, using only general-purpose registers and approved system calls.
While kernel code and underlying hardware are assumed to be non-malicious and functionally correct, the operating system employs safeguards like stack guard pages and strict type checking to mitigate unexpected vulnerabilities.

My Knowledge Base

Explorer

02 Operating system organization