The Page Cache and Page Writeback
Caching Fundamentals
Disk caches minimize disk I/O by storing data in physical memory, leveraging the fact that memory access is orders of magnitude faster than disk access,. Caching relies on temporal locality, the principle that data accessed once will likely be accessed again in the near future.
- Read Operations:
- Cache Hit: The kernel locates requested data in the page cache and reads it directly from RAM, bypassing the disk.
- Cache Miss: The kernel schedules block I/O operations to read data from the disk, subsequently populating the cache to serve future reads,.
- Write Caching Strategies:
- No-write: The cache ignores write operations; data is written directly to disk, invalidating the cached copy,.
- Write-through: Write operations automatically and simultaneously update both the in-memory cache and the on-disk file, keeping the cache coherent,.
- Write-back (Linux Strategy): Processes write directly into the page cache, and the backing store is not immediately updated. Modified pages are marked as dirty (unsynchronized) and added to a dirty list,. Dirty pages are periodically written back to disk in bulk to synchronize the on-disk copy,.
- Cache Eviction Strategies:
- Eviction removes data to free RAM or make room for more relevant cache entries.
- Clairvoyant Algorithm: An ideal, but impossible, strategy that evicts pages least likely to be accessed in the future,.
- Least Recently Used (LRU): Evicts pages with the oldest access timestamp,. This strategy fails optimally when files are accessed only once and then placed at the top of the LRU list.
- Two-List Strategy (Linux LRU/2): Maintains an active list (hot pages unavailable for eviction) and an inactive list (pages available for eviction).
- Both lists function as pseudo-LRU queues, adding items to the tail and removing from the head.
- Pages move to the active list only if accessed while already residing on the inactive list, mitigating the single-use file failure of a standard LRU,.
- The kernel balances the lists by moving pages from the head of the active list to the inactive list if the active list grows disproportionately large.
To implement these cache and eviction strategies, Linux utilizes specific objects to manage physical pages independent of their underlying filesystem.
The Linux Page Cache and address_space
The page cache stores chunks of recently accessed regular files, block device files, and memory-mapped files.
- The
address_spaceObject:- Manages cache entries and page I/O operations independently of the filesystem or physical file,.
- Functions as the physical analogue to the virtual
vm_area_structintroduced during process address space management. host: Points to the associated owninginode(or NULL for non-inode objects like the swapper),.i_mmap: A priority search tree (a mix of heaps and radix trees) linking all shared and private mappings,. It allows the kernel to efficiently locate the multiple virtualvm_area_structmappings associated with a single physical cached file.
address_space_operations:- The
a_opsfield points to an operations table dictating how the backing store interacts with the page cache,. - Page Read Path: The kernel checks for data using
find_get_page(). If missing, it allocates a page, adds it to the cache viaadd_to_page_cache_lru(), and invokesreadpage()to retrieve disk data,. - Page Write Path: Modified pages are flagged via
SetPageDirty(). The generic write path uses__grab_cache_page(), executesprepare_write(), copies data from user-space viafilemap_copy_from_user(), and finishes withcommit_write(). All written pages are added to the cache, serving as a staging ground for disk writes.
- The
- Radix Tree:
- Each
address_spacemaintains a unique radix tree (page_tree) to quickly search for cached pages using a file offset,,. - The radix tree replaced the pre-2.6 global hash table, which suffered from high lock contention, slow miss lookups, and excessive memory consumption,,.
- Each
While the page cache tracks full pages through the address_space, disk I/O requires mapping these pages to exact physical disk blocks via buffers.
The Buffer Cache
Individual disk blocks tie into the page cache via block I/O buffers.
- Buffers act as descriptors that map in-memory pages to physical disk blocks.
- Historically, Linux maintained separate caches for pages and buffers, which duplicated data and wasted synchronization effort,.
- The 2.4 kernel unified these into a single page cache. Buffers now exist solely to describe the mapping of a block onto a cached page, rather than serving as an independent caching layer.
As buffers and pages accumulate modifications in RAM, the kernel relies on flusher threads to commit this dirty data back to disk.
The Flusher Threads and Page Writeback
Dirty pages are written back to disk under three specific conditions:
- Free memory shrinks below the
dirty_background_ratiothreshold,. - Dirty data ages beyond the
dirty_expire_intervalthreshold,. - A user process explicitly invokes the
sync()orfsync()system calls.
- Flusher Threads Mechanism:
- Low-Memory Flush: The kernel invokes
wakeup_flusher_threads()to runbdi_writeback_all(), continuing writeback until the minimum requested pages are written and free memory surpasses thedirty_background_ratiothreshold. - Periodic Flush: A timer periodically wakes a thread to run
wb_writeback(), committing pages modified longer thandirty_expire_intervalmilliseconds ago,. The timer is then reinitialized to fire everydirty_writeback_interval.
- Low-Memory Flush: The kernel invokes
- Laptop Mode:
- Optimizes battery life by maximizing hard drive spin-down time.
- Flusher threads piggyback page writeback onto any other physical disk I/O, flushing all dirty buffers while the disk is already active to prevent future spin-ups.
- Thread Architecture and Congestion Avoidance:
- Older kernels utilized a single
bdflushthread (prone to single-queue bottlenecks) andkupdated(for periodic writes),,,. - The 2.6 kernel introduced
pdflushthreads, which dynamically scaled in number and employed congestion avoidance to prevent starvation on busy device queues by actively targeting uncongested disks,,. - Kernel 2.6.32+ replaced
pdflushwith per-spindle flusher threads,. Associating one thread per block device allows for synchronous writeback, simplifies congestion logic, and prevents single-disk starvation.
- Older kernels utilized a single