Locking and Concurrency Control

Concurrency Context

  • Concurrency emerges from the interleaved execution of multiple instruction streams driven by multiprocessor hardware, thread context switching, and device interrupts.
  • Unrestricted concurrent access to shared data structures yields incorrect computational results or fundamentally broken data states.
  • Locks enforce mutual exclusion, ensuring that only one CPU can access and manipulate a protected data item at any given moment.
  • Mutual exclusion prevents destructive interference between interleaved execution streams, primarily mitigating the risks associated with race conditions.

Race Conditions and Invariants

  • A race condition occurs when a memory location is accessed concurrently by multiple streams, and at least one of those accesses is a write operation.
  • The outcome of a race condition depends entirely on the precise timing of the involved CPUs and the hardware memory ordering.
  • Critical sections are isolated sequences of instructions strictly enclosed between lock acquire and release operations.
  • Locks fundamentally protect data structure invariants, which are specific properties that must remain true across independent operations.
    • Standard operations must temporarily violate these invariants during intermediate update steps.
    • Locks serialize concurrent critical sections, ensuring no CPU observes data while its invariants are temporarily violated.
  • Critical sections guarded by the identical lock execute atomically with respect to one another, preventing the visibility of partially completed updates.
  • While locks resolve race conditions by serializing access, standard software instructions are inadequate to implement functional locks on multiprocessor architectures.

Spinlocks and Atomic Instructions

  • Standard conditional checks intended for mutual exclusion fail on multiprocessors because multiple CPUs can evaluate an unlocked state simultaneously and proceed to acquire the lock concurrently.
  • Hardware-level atomic instructions are strictly required to bridge the gap between reading a lock’s state and modifying it.
  • RISC-V architectures provide the amoswap r, a instruction, which performs an atomic memory swap.
    • The instruction reads the value at memory address a, writes the contents of register r to a, and places the original memory value into r.
    • Hardware mechanisms guarantee this sequence is completely indivisible, preventing concurrent memory access during execution.
  • Spinlocks are implemented via an acquire function that executes a continuous loop, repeatedly swapping 1 into the lock state until the previously swapped value returns 0.
  • The release function clears the lock state atomically, utilizing underlying hardware swaps to assign 0 without risking concurrent memory corruption.
  • The deployment of these hardware-backed atomic instructions enables the creation of spinlocks, necessitating strict design rules for lock granularity and contention management.

Lock Granularity and Contention

  • Locks must guard any variable susceptible to concurrent read/write or write/write operations across different CPUs.
  • Invariants spanning multiple distinct memory locations require a single, unified lock protecting all involved locations simultaneously.
  • Lock contention manifests when multiple processes conflict over the exact same lock, degrading system performance by forcing CPUs into useless spinning cycles.
  • Granularity dictates the balance between safety and parallelism:
    • Coarse-grained locking uses fewer locks covering large systems, sacrificing parallelism by forcing unrelated processes to wait.
    • Fine-grained locking assigns separate locks to discrete data segments, enabling high parallelism but increasing structural complexity.
  • Increasing lock granularity improves theoretical parallelism but introduces the severe structural risk of code paths requiring multiple locks simultaneously.

Deadlock and Lock Ordering

  • Deadlock occurs when multiple code paths hold conflicting locks and enter a state of indefinite blocking, each waiting to acquire a lock held by the other.
  • Deadlock avoidance dictates that all execution paths must acquire multiple locks in a strict, globally enforced order.
  • This required acquisition sequence fundamentally integrates into each function’s technical specification.
  • Enforcing global ordering rules becomes highly complex when the identities of necessary locks are dynamically discovered during active execution.
  • Non-re-entrant locks are strictly utilized; a process attempting to acquire a lock it already holds will intentionally trigger a system panic.
  • Re-entrant (recursive) locks break the fundamental intuition that critical sections remain atomic relative to other critical sections, risking silent errors over visible deadlocks.
  • Deadlock prevention relies on strict execution ordering between threads, but additional overriding constraints apply when these locks interact with hardware interrupts.

Locks and Interrupt Handlers

  • Spinlocks frequently protect shared data required by both standard kernel threads and asynchronous hardware interrupt handlers.
  • If an interrupt handler executes on a CPU already holding a lock required by that handler, the specific CPU deadlocks permanently because the interrupted thread cannot resume to release it.
  • CPUs must explicitly disable local hardware interrupts immediately prior to acquiring any spinlock to prevent this single-CPU deadlock scenario.
  • Nested critical sections require rigorous state tracking using push_off and pop_off.
    • push_off increments a lock nesting count and disables interrupts.
    • pop_off decrements the count, restoring the original interrupt state only when the nesting count hits zero.
  • Execution order is strictly enforced: push_off must execute before modifying the lock state, and pop_off must execute only after the lock is fully released.
  • Hardware interrupts represent one class of unexpected execution re-ordering that threatens locks; CPU architecture memory models represent another.

Instruction and Memory Ordering

  • Modern CPUs and compilers frequently re-order sequential instructions to maximize throughput and prevent processor stalling.
  • Memory model rules preserve the correctness of serial execution but permit re-ordering that actively breaks concurrent code logic.
  • Memory barriers, such as __sync_synchronize(), explicitly instruct the compiler and hardware to halt the re-ordering of load or store instructions across the barrier boundary.
  • acquire and release functions embed memory barriers to guarantee that modifications inside a critical section become strictly visible to other CPUs before the lock is released.
  • Spinlocks with memory barriers are effective for instantaneous operations, but lengthy physical operations require a fundamentally different locking architecture.

Sleep-locks

  • Spinlocks are entirely unsuitable for long-duration operations, such as disk I/O, because waiting threads waste substantial CPU cycles continuously spinning.
  • A thread is forbidden from yielding the CPU while holding a spinlock, as yielding invites deadlock and violates the strict disabled-interrupt requirement.
  • Sleep-locks resolve this architectural limitation by automatically yielding the CPU to other processes while waiting for acquisition.
  • The internal state of a sleep-lock is guarded by a standard spinlock; the acquiring process atomically yields the CPU and drops the internal spinlock simultaneously.
  • Functional properties of sleep-locks:
    • Permit hardware interrupts while held.
    • Permit thread yielding while held.
    • Strictly prohibited within interrupt handlers.
    • Strictly prohibited within spinlock critical sections.
  • The dual abstractions of hardware-backed spinlocks and yielding sleep-locks form the foundation for managing shared memory configurations in real-world environments.

Real World Implementation Variables

  • Modern operating systems expose advanced concurrency controls to user space applications via POSIX threads (Pthreads).
  • High-contention locks severely degrade multiprocessor performance through cache line bouncing, as atomic instructions force data transfer between isolated hardware caches.
  • Lock-free data structures eliminate direct lock overhead but require highly complex manual management of instruction and memory re-ordering constraints.