Models of Memory Consistency
The Concept of Memory Consistency
- Cache coherence mechanisms ensure a consistent view of a single memory location, but they do not define the specific timeline under which a processor must observe updates to different memory locations.
- Memory consistency establishes the fundamental properties and strict ordering rules enforced among reads and writes to different memory locations by multiple processors.
- Without a defined consistency model, memory update visibility relies solely on arbitrary interconnect and invalidation delays.
- Delayed write invalidations in a loosely ordered system can allow multiple processors to simultaneously evaluate mutually exclusive conditions as true, leading to nonobvious and incorrect program execution.
To eliminate unpredictable execution overlaps, multiprocessor systems must establish formal sequential ordering rules for all shared memory accesses.
Sequential Consistency
- Sequential consistency mandates that the result of any execution strictly matches a scenario where memory accesses by each processor execute in order, with all accesses among all processors arbitrarily interleaved.
- Hardware implementation requires delaying the completion of any memory access until all system-wide invalidations caused by that specific access are complete.
- This model introduces severe performance limitations due to cumulative stall cycles scaling with processor count and interconnect delays.
- A write miss requires cycles to establish block ownership, issue network invalidates, and await all acknowledgments before the pipeline can proceed.
- Processors cannot place a write operation into a write buffer and continue executing independent reads; they must stall until the global write sequence finalizes.
The performance penalties inherent in strict sequential ordering necessitate a hardware-software contract that restricts arbitrary access patterns to enable safe latency hiding.
Synchronized Programs and Data Races
- A program is considered synchronized if all accesses to shared data are explicitly ordered by hardware-supported synchronization operations.
- Memory access ordering is guaranteed if a write to a variable by one processor and a subsequent access by another processor are separated by a precise pair of synchronization operations.
- Data races occur when shared variables are updated without synchronization, rendering execution outcomes unpredictable and highly dependent on relative processor speeds.
- Synchronized programs inherently avoid these timing anomalies and are classified as data-race-free.
- Utilizing standard synchronization libraries ensures that data-race-free programs observe sequentially consistent execution, regardless of how aggressively the underlying hardware reorders instructions.
By guaranteeing sequential consistency strictly for synchronized programs, hardware architectures are freed to aggressively reorder independent memory operations using relaxed rules.
Relaxed Consistency Models
- Relaxed consistency models permit reads and writes to complete out of pipeline order, relying entirely on explicit synchronization operations to enforce required execution boundaries.
- Specific models are defined by the subset of strict sequential orderings they selectively relax.
- Sequential consistency enforces four primary access orderings, where indicates operation must complete before operation begins:
- , , , and .
- Hardware implementations of relaxed models are classified as follows:
- Total Store Ordering (TSO) / Processor Consistency: Relaxes only the ordering, maintaining strict serialized order among all system writes.
- Partial Store Order (PSO): Relaxes both the and orderings.
- Weak Ordering / Release Consistency: Relaxes all four basic orderings (, , , ), maximizing hardware flexibility.
Fully relaxing all fundamental memory orderings requires specialized synchronization primitives to manage explicit data boundaries and prevent structural violations.
Release Consistency
- Release consistency structurally distinguishes between synchronization operations that acquire access to a shared variable () and operations that release the object ().
- Strict orderings are preserved exclusively around these specific synchronization boundaries rather than general memory accesses:
- Reads or writes preceding an acquire operation do not need to complete before the acquire executes.
- Reads or writes following a release operation do not need to wait for the release to complete.
- Preserved orderings dictate interactions only between , , and ordinary memory operations (e.g., , , , ).
- Certain synchronization operations, such as barriers, act simultaneously as both an acquire and a release, effectively equating the memory ordering to weak ordering.
- Explicit
FENCEinstructions are utilized to guarantee that all previous writes and their associated invalidates are fully complete without relying on an identified synchronization operation. - Architectures such as RISC-V, ARMv8, and specifications like C/C++ utilize release consistency to maximize hardware out-of-order optimization while maintaining strict execution predictability.