Memory Virtualization

The Abstraction: Address Spaces and Memory API

The operating system virtualizes physical memory to provide each process with an isolated, contiguous address space. Every address generated by a user program is a virtual address; the OS and hardware translate these to physical addresses.

Address Space Structure
- Code: Static instructions, located at the top of the address space.
- Heap: Dynamically allocated memory, grows downward. Managed explicitly via API.
- Stack: Local variables and function call frames, grows upward. Managed implicitly by the compiler.
Memory API
- malloc(size_t size): Requests heap space. Returns a pointer to the allocated region.
- free(void *ptr): Returns heap memory to the allocator.
- Common Errors: Buffer overflows, memory leaks (failing to free), dangling pointers (using after free), and double frees.
- OS Support: User-level allocators rely on OS system calls like brk, sbrk, or mmap to expand the heap region.

To implement this abstraction, the system requires mechanisms to map virtual addresses to physical locations while maintaining isolation and high performance.

Hardware-Based Address Translation (Base and Bounds)

Dynamic relocation utilizes hardware registers in the Memory Management Unit (MMU) to translate virtual addresses to physical addresses at runtime.

Mechanism
- Base Register: Stores the physical starting address of the process.
- Bounds (Limit) Register: Stores the size of the address space to ensure memory protection.
- Translation Equation: $P h ys i c a l A dd ress = Vi r t u a l A dd ress + B a se$
Hardware and OS Requirements
- Hardware must support dual privilege levels (user/kernel mode) and generate exceptions for out-of-bounds access.
- The OS allocates physical memory (via a free list), updates base/bounds registers on context switches, and handles out-of-bounds exceptions (usually by killing the process).
Internal Fragmentation
- Because the entire address space is placed contiguously in physical memory, the unused space between the heap and the stack is wasted. This waste inside the allocated unit is called internal fragmentation.

To eliminate the internal fragmentation caused by a single base/bounds pair, the memory management system must treat the logical regions of the address space independently.

Segmentation

Segmentation assigns an independent base and bounds register pair to each logical segment of the address space (Code, Heap, Stack).

Translation and Offset Calculation
- Virtual addresses are divided into a Segment Identifier (top bits) and a Segment Offset (bottom bits).
- The hardware uses the segment identifier to select the correct base register and adds the offset to calculate the physical address.
Hardware Support for Segments
- Growth Direction Bit: Tracks whether the segment grows positively (Heap/Code) or negatively (Stack). For negative growth, the offset is subtracted from the maximum segment size to determine the physical location.
- Protection Bits: Define permissions (Read/Write/Execute) per segment. Marking a code segment as Read/Execute allows the OS to share the exact same physical memory pages across multiple processes.
External Fragmentation
- Because segments are variable-sized, physical memory becomes riddled with non-contiguous holes over time. This inability to satisfy memory requests despite sufficient total free space is called external fragmentation.

Managing external fragmentation requires sophisticated algorithms to track, split, and coalesce variable-sized memory holes.

Free-Space Management

Allocators (both OS-level for segments and user-level for heaps) utilize a free list embedded directly within the free memory itself to track available space.

Core Mechanisms
- Header Blocks: Placed immediately before an allocated chunk. Stores the exact size of the allocation and a magic number for integrity checking. When free() is called, the allocator reads this header to know how many bytes to reclaim.
- Splitting: Dividing a larger free chunk into a requested allocation size and a new, smaller free chunk.
- Coalescing: Merging adjacent free chunks into a single, larger contiguous block to combat fragmentation.
Allocation Strategies
- Best Fit: Exhaustively searches for the smallest free chunk that satisfies the request. Reduces wasted space but incurs high search overhead.
- Worst Fit: Exhaustively searches for the largest chunk and splits it. Leaves large chunks available but performs poorly in practice.
- First Fit: Returns the first sufficient chunk found. Fast, but pollutes the front of the list with small splinters.
- Next Fit: Similar to First Fit but maintains a pointer to the last searched location to distribute fragmentation evenly.
Advanced Approaches
- Segregated Lists (Slab Allocator): Maintains separate free lists for frequently requested, fixed-size objects (e.g., OS locks, inodes). Drastically reduces fragmentation and initialization overhead.
- Binary Buddy Allocation: Divides free space by powers of two. Highly prone to internal fragmentation, but enables extremely fast coalescing using bitwise address comparisons.

Variable-sized allocation fundamentally guarantees some degree of external fragmentation. To completely eradicate external fragmentation, the system must abandon variable-sized chunks entirely.

Paging: Mechanisms and Structures

Paging divides a process’s virtual address space into fixed-sized virtual pages and divides physical memory into identically sized physical page frames.

Virtual Address Breakdown
- A virtual address is split into a Virtual Page Number (VPN) and an Offset.
- The OS maintains a Page Table to map the VPN to a Physical Frame Number (PFN).
- The offset remains untranslated and is concatenated with the PFN to form the physical address.
Page Table Entries (PTE)
- Valid Bit: Indicates if the translation is valid (crucial for supporting sparse address spaces without allocating physical frames for unused regions).
- Protection Bits: Determine Read/Write/Execute permissions.
- Present Bit: Indicates if the page is in physical RAM or swapped to disk.
- Dirty / Accessed Bits: Used for cache replacement policies.
The Flaws of Paging
- Speed: Linear page tables require an extra memory access for every instruction fetch and data reference, halving performance.
- Size: A linear page table requires an entry for every VPN. A 32-bit address space with 4KB pages requires 4MB of page table space per process.

Paging solves external fragmentation but introduces severe speed and space penalties. The speed penalty is addressed first via hardware caching.

Paging: Faster Translations (TLBs)

A Translation-Lookaside Buffer (TLB) is a fully-associative, on-chip hardware cache located in the MMU that stores popular VPN-to-PFN translations.

TLB Operations
- TLB Hit: The VPN is found in the TLB. The PFN is retrieved immediately, bypassing the page table in memory.
- TLB Miss: The translation is missing. The system must traverse the page table in RAM, load the PTE into the TLB, and retry the instruction.
Locality
- TLBs rely on spatial locality (accessing clustered memory, like arrays, utilizes the same page) and temporal locality (re-accessing the same variables).
Miss Handling
- Hardware-Managed (CISC/x86): Hardware traverses the page table via a Page Table Base Register (PTBR) and updates the TLB.
- Software-Managed (RISC/MIPS): Hardware raises an exception, pausing the instruction. The OS trap handler manually traverses the page table, updates the TLB via privileged instructions, and returns from the trap to retry the instruction.
Context Switches
- TLB translations are strictly per-process.
- To prevent a new process from reading stale translations, the OS can either flush the TLB on every context switch (costly) or utilize an Address Space Identifier (ASID) in the TLB to safely co-host entries from multiple processes.

With the TLB masking the speed penalty of paging, the memory management subsystem must next solve the severe memory overhead consumed by linear page tables.

Paging: Smaller Tables

To reduce the memory footprint of page tables, the OS utilizes hierarchical data structures rather than linear arrays, applying indirection to prune invalid address space regions.

Hybrid Approach (Paging + Segments)
- Utilizes a separate page table for each logical segment (Code, Heap, Stack).
- A base register points to the segment’s page table, and a bounds register caps its size. Unused space between the heap and stack requires no PTEs.
- Drawback: Reintroduces external fragmentation for the page tables themselves, as they are now variable-sized.
Multi-Level Page Tables
- Chops the page table into page-sized units.
- Introduces a Page Directory containing Page Directory Entries (PDE). A PDE tracks whether an entire page of PTEs is valid.
- If a region of the address space is unused, the PDE is marked invalid, and the corresponding page of PTEs is never allocated.
- Translation Process:
  1. Top bits of VPN index the Page Directory.
  2. Middle bits of VPN index the Page Table.
  3. Bottom bits of Virtual Address act as the Page Offset.
- Time-Space Trade-off: Saves massive amounts of memory but requires two or more memory accesses on a TLB miss.
Inverted Page Tables
- Maintains a single, global page table indexed by physical frame rather than virtual page. A hash table maps the current VPN and Process ID to the corresponding physical frame, radically reducing structural overhead.

Even with compact page tables, physical RAM is ultimately finite. To support large, concurrent workloads, the OS must leverage secondary storage.

Beyond Physical Memory: Swapping Mechanisms

The OS utilizes slower secondary storage—Swap Space—to create the illusion of a virtual address space larger than available physical memory.

The Page Fault
- When the hardware attempts a translation and the Present Bit in the PTE is 0, it raises a Page Fault exception.
- The OS Page Fault Handler takes control. The PTE often stores the disk address of the swapped page in place of the PFN.
Control Flow for Missing Pages
1. OS identifies a free physical frame.
2. OS issues a blocking I/O request to disk to fetch the page.
3. Context switches to another process while I/O is in flight.
4. Upon I/O completion, OS updates the PTE (Present = 1, sets new PFN).
5. Instruction is retried, resulting in a TLB miss, followed by a TLB hit.
Swap Daemons and Watermarks
- The OS rarely waits until memory is entirely full to free pages.
- A background thread (Swap Daemon) monitors memory limits. When free pages drop below a Low Watermark, the daemon evicts pages until a High Watermark is reached, clustering disk writes for I/O efficiency.

When physical memory is full, the OS must decide exactly which pages to evict to disk to make room for incoming pages.

Beyond Physical Memory: Replacement Policies

Memory functions as a cache for swap space. The OS’s replacement policy attempts to minimize the miss rate and, consequently, the Average Memory Access Time (AMAT): $A M A T = (P_{H i t} \cdot T_{M}) + (P_{M i ss} \cdot T_{D})$

Baseline Policies
- Optimal (MIN): Evicts the page accessed furthest in the future. Impossible to implement, but serves as a benchmark.
- FIFO (First-In, First-Out): Evicts the oldest page. Ignores locality entirely and suffers from Belady’s Anomaly (increasing cache size can decrease hit rate).
- Random: Evicts a random page. Performs adequately and avoids worst-case looping corner cases.
Exploiting History: LRU
- LRU (Least Recently Used): Evicts the page that has not been accessed for the longest time, leveraging temporal locality.
- Implementation Flaw: Perfect LRU requires updating a timestamp/list on every single memory access, which is prohibitively expensive.
Approximating LRU (The Clock Algorithm)
- Requires hardware support via a Use (Reference) Bit. Set to 1 by hardware upon access.
- Pages are treated as a circular list. A “clock hand” scans the list.
- If Use Bit == 1: Set to 0 and advance hand.
- If Use Bit == 0: Evict this page.
Dirty Pages
- Hardware tracks modifications via a Dirty (Modified) Bit.
- Evicting clean pages is effectively free (data is already backed on disk or read-only binary). Evicting dirty pages requires a costly disk write. Clock algorithms are typically modified to target clean, unreferenced pages first.
Thrashing
- If the aggregate working set of all processes exceeds physical RAM, the system spends all its time paging. Modern systems counter this via admission control or Out-Of-Memory (OOM) killers.

All of these isolated mechanisms and policies must be carefully integrated to build a cohesive, performant operating system architecture.

Case Study: The VAX/VMS Virtual Memory System

The VAX/VMS architecture exemplifies how software intelligence can mask and mitigate hardware deficiencies (such as an excessively small 512-byte page size).

Address Space Layout
- Split into Process Space (P0 for Heap/Code, P1 for Stack) and System Space (S for the OS).
- Kernel is mapped directly into every user’s virtual address space (protected by privilege bits), avoiding expensive data copying during system calls.
Managing Page Table Size
- Uses the Hybrid (Paging + Segments) approach to avoid PTE waste between the heap and stack.
- Places user page tables into the Kernel Virtual Memory (System Space), allowing the page tables themselves to be swapped to disk under memory pressure.
Advanced Replacement and Allocation
- Segmented FIFO: Uses a base FIFO policy per process (Resident Set Size). Replaced pages move to a global Second-Chance List (Clean or Dirty). If the process faults on that page before it is overwritten, it is rapidly salvaged from RAM instead of disk.
- Demand Zeroing: Lazy allocation. OS maps new heap pages as inaccessible. When the process traps by attempting to use them, the OS allocates and zeroes a physical frame just-in-time.
- Copy-On-Write (COW): When fork() is called, the OS maps pages into the child’s address space as read-only rather than copying the data. If either process attempts a write, the OS traps, allocates a new physical frame, copies the data, and updates permissions.

My Knowledge Base

Explorer

03 Memory Virtualization