The Page Cache and Page Writeback

Caching Fundamentals

Disk caches minimize disk I/O by storing data in physical memory, leveraging the fact that memory access is orders of magnitude faster than disk access,. Caching relies on temporal locality, the principle that data accessed once will likely be accessed again in the near future.

  • Read Operations:
    • Cache Hit: The kernel locates requested data in the page cache and reads it directly from RAM, bypassing the disk.
    • Cache Miss: The kernel schedules block I/O operations to read data from the disk, subsequently populating the cache to serve future reads,.
  • Write Caching Strategies:
    • No-write: The cache ignores write operations; data is written directly to disk, invalidating the cached copy,.
    • Write-through: Write operations automatically and simultaneously update both the in-memory cache and the on-disk file, keeping the cache coherent,.
    • Write-back (Linux Strategy): Processes write directly into the page cache, and the backing store is not immediately updated. Modified pages are marked as dirty (unsynchronized) and added to a dirty list,. Dirty pages are periodically written back to disk in bulk to synchronize the on-disk copy,.
  • Cache Eviction Strategies:
    • Eviction removes data to free RAM or make room for more relevant cache entries.
    • Clairvoyant Algorithm: An ideal, but impossible, strategy that evicts pages least likely to be accessed in the future,.
    • Least Recently Used (LRU): Evicts pages with the oldest access timestamp,. This strategy fails optimally when files are accessed only once and then placed at the top of the LRU list.
    • Two-List Strategy (Linux LRU/2): Maintains an active list (hot pages unavailable for eviction) and an inactive list (pages available for eviction).
      • Both lists function as pseudo-LRU queues, adding items to the tail and removing from the head.
      • Pages move to the active list only if accessed while already residing on the inactive list, mitigating the single-use file failure of a standard LRU,.
      • The kernel balances the lists by moving pages from the head of the active list to the inactive list if the active list grows disproportionately large.

To implement these cache and eviction strategies, Linux utilizes specific objects to manage physical pages independent of their underlying filesystem.

The Linux Page Cache and address_space

The page cache stores chunks of recently accessed regular files, block device files, and memory-mapped files.

  • The address_space Object:
    • Manages cache entries and page I/O operations independently of the filesystem or physical file,.
    • Functions as the physical analogue to the virtual vm_area_struct introduced during process address space management.
    • host: Points to the associated owning inode (or NULL for non-inode objects like the swapper),.
    • i_mmap: A priority search tree (a mix of heaps and radix trees) linking all shared and private mappings,. It allows the kernel to efficiently locate the multiple virtual vm_area_struct mappings associated with a single physical cached file.
  • address_space_operations:
    • The a_ops field points to an operations table dictating how the backing store interacts with the page cache,.
    • Page Read Path: The kernel checks for data using find_get_page(). If missing, it allocates a page, adds it to the cache via add_to_page_cache_lru(), and invokes readpage() to retrieve disk data,.
    • Page Write Path: Modified pages are flagged via SetPageDirty(). The generic write path uses __grab_cache_page(), executes prepare_write(), copies data from user-space via filemap_copy_from_user(), and finishes with commit_write(). All written pages are added to the cache, serving as a staging ground for disk writes.
  • Radix Tree:
    • Each address_space maintains a unique radix tree (page_tree) to quickly search for cached pages using a file offset,,.
    • The radix tree replaced the pre-2.6 global hash table, which suffered from high lock contention, slow miss lookups, and excessive memory consumption,,.

While the page cache tracks full pages through the address_space, disk I/O requires mapping these pages to exact physical disk blocks via buffers.

The Buffer Cache

Individual disk blocks tie into the page cache via block I/O buffers.

  • Buffers act as descriptors that map in-memory pages to physical disk blocks.
  • Historically, Linux maintained separate caches for pages and buffers, which duplicated data and wasted synchronization effort,.
  • The 2.4 kernel unified these into a single page cache. Buffers now exist solely to describe the mapping of a block onto a cached page, rather than serving as an independent caching layer.

As buffers and pages accumulate modifications in RAM, the kernel relies on flusher threads to commit this dirty data back to disk.

The Flusher Threads and Page Writeback

Dirty pages are written back to disk under three specific conditions:

  1. Free memory shrinks below the dirty_background_ratio threshold,.
  2. Dirty data ages beyond the dirty_expire_interval threshold,.
  3. A user process explicitly invokes the sync() or fsync() system calls.
  • Flusher Threads Mechanism:
    • Low-Memory Flush: The kernel invokes wakeup_flusher_threads() to run bdi_writeback_all(), continuing writeback until the minimum requested pages are written and free memory surpasses the dirty_background_ratio threshold.
    • Periodic Flush: A timer periodically wakes a thread to run wb_writeback(), committing pages modified longer than dirty_expire_interval milliseconds ago,. The timer is then reinitialized to fire every dirty_writeback_interval.
  • Laptop Mode:
    • Optimizes battery life by maximizing hard drive spin-down time.
    • Flusher threads piggyback page writeback onto any other physical disk I/O, flushing all dirty buffers while the disk is already active to prevent future spin-ups.
  • Thread Architecture and Congestion Avoidance:
    • Older kernels utilized a single bdflush thread (prone to single-queue bottlenecks) and kupdated (for periodic writes),,,.
    • The 2.6 kernel introduced pdflush threads, which dynamically scaled in number and employed congestion avoidance to prevent starvation on busy device queues by actively targeting uncongested disks,,.
    • Kernel 2.6.32+ replaced pdflush with per-spindle flusher threads,. Associating one thread per block device allows for synchronous writeback, simplifies congestion logic, and prevents single-disk starvation.