The Process Address Space

Address Spaces

The process address space consists of the virtual memory addressable by a user-space process and the specific addresses within that virtual memory the process is permitted to use.

  • Memory Layout: Each process is given a flat 32- or 64-bit address space, meaning addresses exist in a single continuous linear range.
  • Process Isolation: The flat address space is unique to each process. A specific memory address in one process’s address space is completely unrelated to that same address in another process, unless processes intentionally share their address space as threads.
  • Memory Areas: The address space is divided into intervals of legal addresses known as memory areas. Valid addresses exist in exactly one area; memory areas do not overlap. A process can dynamically add and remove these areas through the kernel.
  • Permissions: Each memory area possesses associated permissions (readable, writable, executable) that the process must respect. Accessing invalid areas or violating permissions triggers a Segmentation Fault.
  • Memory Area Contents: Memory areas contain mapped data, including:
    • Text section: Executable file’s code.
    • Data section: Initialized global variables.
    • Bss section: Uninitialized global variables, mapped over the zero page.
    • User-space stack: Mapped over the zero page.
    • Shared libraries: Text, data, and bss sections for loaded libraries like the C library and dynamic linker.
    • Other mappings: Memory mapped files, shared memory segments, and anonymous memory mappings.

To manage these distinct, non-overlapping memory areas for a process, the kernel employs a specific data structure known as the memory descriptor.

The Memory Descriptor (mm_struct)

The kernel represents a process’s address space with the memory descriptor, defined as struct mm_struct in <linux/mm_types.h>.

  • Usage and Reference Counters:
    • mm_users: Tracks the number of processes currently using this address space. If two threads share the address space, this value is two.
    • mm_count: The primary reference count for the mm_struct. All mm_users collectively equate to one increment of mm_count. The descriptor is only freed when mm_count reaches zero, which occurs after mm_users reaches zero.
  • Data Structures for Memory Areas:
    • mmap: Points to a singly linked list of all memory areas, allowing efficient traversal.
    • mm_rb: Points to a red-black tree of all memory areas, allowing search efficiency.
    • Threaded Tree: Overlaying a linked list onto a tree to access the same underlying data is known as a threaded tree.
  • Global List: All mm_struct structures are linked in a doubly linked list via the mmlist field, starting with init_mm (the init process) and protected by mmlist_lock.

Allocating and Destroying the Memory Descriptor

  • Allocation: The descriptor is stored in the mm field of the process descriptor (task_struct). During fork(), copy_mm() copies the parent’s memory descriptor to the child. The structure is allocated from the mm_cachep slab cache via allocate_mm().
  • Shared Address Spaces (Threads): If CLONE_VM is specified during cloning, allocate_mm() is skipped, the mm_users count is incremented, and the new process’s mm field points directly to the parent’s memory descriptor.
  • Destruction: When a process exits, exit_mm() is invoked. This function calls mmput() to decrement mm_users. If mm_users reaches zero, mmdrop() is called to decrement mm_count. If mm_count reaches zero, free_mm() returns the structure to the mm_cachep slab cache.

Kernel Threads and the Memory Descriptor

Kernel threads lack a user context and do not access user-space memory; therefore, their mm field is NULL.

  • To access necessary kernel memory (like page tables) without wasting cycles switching address spaces or maintaining dedicated memory descriptors, kernel threads use the memory descriptor of the previously scheduled process.
  • When scheduled, the kernel sets the active_mm field of the kernel thread’s process descriptor to point to the previous process’s memory descriptor.

The memory descriptor tracks all valid address intervals within the address space through discrete structures representing individual virtual memory areas.

Virtual Memory Areas (VMAs)

The memory area structure, vm_area_struct (defined in <linux/mm_types.h>), describes a single memory area over a contiguous interval in a given address space. The kernel treats each VMA as a unique memory object with specific permissions and operations.

  • Address Interval:
    • vm_start: Initial (lowest) address in the interval, inclusive.
    • vm_end: First byte after the final (highest) address in the interval, exclusive.
  • Association: The vm_mm field points back to the associated mm_struct. VMAs are unique to their mm_struct, meaning multiple processes mapping the same file will each have a unique vm_area_struct.
  • VMA Flags (vm_flags): Specify behavior for the memory area as a whole, managed by the kernel rather than hardware.
    • VM_READ, VM_WRITE, VM_EXEC: Standard read, write, and execute permissions.
    • VM_SHARED: Identifies a shared mapping visible to multiple processes. If unset, it is a private mapping.
    • VM_IO: Specifies a mapping of a device’s I/O space.
    • VM_SEQ_READ / VM_RAND_READ: Hints for read-ahead behavior, set via the madvise() system call.
  • VMA Operations (vm_ops): Points to a vm_operations_struct containing methods to manipulate the VMA.
    • open: Invoked when the area is added to an address space.
    • close: Invoked when the area is removed from an address space.
    • fault: Invoked by the page fault handler when accessing a page not present in physical memory.
    • page_mkwrite: Invoked when a read-only page is made writable.
  • Memory Area Representation in Real Life: VMAs map to the output seen in /proc or pmap(1). This output shows exact memory ranges, permissions, offsets, and backing files. Shared libraries are loaded into physical memory only once and mapped across multiple processes, resulting in substantial space savings. Bss sections and other uninitialized data map to the zero page, ensuring initialized regions of all zeros.

Because a process relies heavily on its allocated virtual memory areas, the kernel provides specific helper functions to locate and manipulate these structures efficiently.

Manipulating Memory Areas

The kernel frequently performs operations on memory areas, necessitating helper functions declared in <linux/mm.h> to search and evaluate VMAs.

  • find_vma(): Searches the given address space for the first VMA where vm_end > addr.
    • The result is cached in the mmap_cache field of the memory descriptor to optimize consecutive operations.
    • If the cache misses, the function traverses the red-black tree (mm_rb).
    • It returns the matching vm_area_struct or NULL if no such area exists.
  • find_vma_prev(): Functions identically to find_vma(), but additionally returns a pointer to the preceding VMA via a double pointer argument.
  • find_vma_intersection(): Returns the first VMA that overlaps a given address interval by wrapping find_vma() and ensuring the returned VMA does not start after the specified interval’s end address.

Beyond querying existing intervals, the kernel requires mechanisms to dynamically allocate new intervals or deallocate existing ones.

Creating and Removing Address Intervals

The kernel expands or reduces a process’s address space by explicitly creating or removing linear address intervals.

  • Creating Intervals (do_mmap):
    • do_mmap() creates a new linear address interval. If the new interval is adjacent to an existing VMA and shares the same permissions, they are merged. Otherwise, a new vm_area_struct is allocated from the vm_area_cachep slab cache.
    • The new memory area is linked into the address space’s linked list and red-black tree via vma_link(), and the total_vm field is updated.
    • Parameters include addr (initial address), len (length), prot (page protection flags like PROT_READ), and flags (map type flags like MAP_SHARED or MAP_ANONYMOUS).
    • This functionality is exported to user-space via the mmap2() system call (which receives the offset in pages to handle larger files).
  • Removing Intervals (do_munmap):
    • do_munmap() removes a specified address interval starting at start of length len from a process address space.
    • This is exported to user-space via the munmap() system call, which acts as a wrapper by grabbing the mmap_sem lock before executing do_munmap().

To utilize these dynamically created virtual memory areas, the kernel must map the virtual addresses to physical memory using a hierarchical indexing system.

Page Tables

While applications operate on virtual memory, processors require physical addresses. Page tables convert virtual memory addresses to physical addresses by splitting the virtual address into chunks used as indices into hierarchical tables.

  • Three-Level Architecture: Linux utilizes three levels of page tables to support a sparsely populated address space.
    • Page Global Directory (PGD): The top-level table consisting of an array of pgd_t types. Entries point to the PMD.
    • Page Middle Directory (PMD): The second-level table consisting of an array of pmd_t types. Entries point to the PTE.
    • Page Table Entries (PTE): The final level consisting of pte_t types. Entries point directly to physical pages.
  • Management: Each process has distinct page tables. The pgd field of the memory descriptor points to the process’s PGD. Traversing and manipulating the page tables requires acquiring the page_table_lock.
  • Translation Lookaside Buffer (TLB): To mitigate the performance overhead of resolving virtual-to-physical mappings, processors implement a hardware cache called the TLB. The processor first checks the TLB for the mapping; on a miss, it consults the page tables to retrieve the physical address.