Concurrency emerges from the interleaved execution of multiple instruction streams driven by multiprocessor hardware, thread context switching, and device interrupts.
Unrestricted concurrent access to shared data structures yields incorrect computational results or fundamentally broken data states.
Locks enforce mutual exclusion, ensuring that only one CPU can access and manipulate a protected data item at any given moment.
Mutual exclusion prevents destructive interference between interleaved execution streams, primarily mitigating the risks associated with race conditions.
Race Conditions and Invariants
A race condition occurs when a memory location is accessed concurrently by multiple streams, and at least one of those accesses is a write operation.
The outcome of a race condition depends entirely on the precise timing of the involved CPUs and the hardware memory ordering.
Critical sections are isolated sequences of instructions strictly enclosed between lock acquire and release operations.
Locks fundamentally protect data structure invariants, which are specific properties that must remain true across independent operations.
Standard operations must temporarily violate these invariants during intermediate update steps.
Locks serialize concurrent critical sections, ensuring no CPU observes data while its invariants are temporarily violated.
Critical sections guarded by the identical lock execute atomically with respect to one another, preventing the visibility of partially completed updates.
While locks resolve race conditions by serializing access, standard software instructions are inadequate to implement functional locks on multiprocessor architectures.
Spinlocks and Atomic Instructions
Standard conditional checks intended for mutual exclusion fail on multiprocessors because multiple CPUs can evaluate an unlocked state simultaneously and proceed to acquire the lock concurrently.
Hardware-level atomic instructions are strictly required to bridge the gap between reading a lock’s state and modifying it.
RISC-V architectures provide the amoswap r, a instruction, which performs an atomic memory swap.
The instruction reads the value at memory address a, writes the contents of register r to a, and places the original memory value into r.
Hardware mechanisms guarantee this sequence is completely indivisible, preventing concurrent memory access during execution.
Spinlocks are implemented via an acquire function that executes a continuous loop, repeatedly swapping 1 into the lock state until the previously swapped value returns 0.
The release function clears the lock state atomically, utilizing underlying hardware swaps to assign 0 without risking concurrent memory corruption.
The deployment of these hardware-backed atomic instructions enables the creation of spinlocks, necessitating strict design rules for lock granularity and contention management.
Lock Granularity and Contention
Locks must guard any variable susceptible to concurrent read/write or write/write operations across different CPUs.
Invariants spanning multiple distinct memory locations require a single, unified lock protecting all involved locations simultaneously.
Lock contention manifests when multiple processes conflict over the exact same lock, degrading system performance by forcing CPUs into useless spinning cycles.
Granularity dictates the balance between safety and parallelism:
Coarse-grained locking uses fewer locks covering large systems, sacrificing parallelism by forcing unrelated processes to wait.
Fine-grained locking assigns separate locks to discrete data segments, enabling high parallelism but increasing structural complexity.
Increasing lock granularity improves theoretical parallelism but introduces the severe structural risk of code paths requiring multiple locks simultaneously.
Deadlock and Lock Ordering
Deadlock occurs when multiple code paths hold conflicting locks and enter a state of indefinite blocking, each waiting to acquire a lock held by the other.
Deadlock avoidance dictates that all execution paths must acquire multiple locks in a strict, globally enforced order.
This required acquisition sequence fundamentally integrates into each function’s technical specification.
Enforcing global ordering rules becomes highly complex when the identities of necessary locks are dynamically discovered during active execution.
Non-re-entrant locks are strictly utilized; a process attempting to acquire a lock it already holds will intentionally trigger a system panic.
Re-entrant (recursive) locks break the fundamental intuition that critical sections remain atomic relative to other critical sections, risking silent errors over visible deadlocks.
Deadlock prevention relies on strict execution ordering between threads, but additional overriding constraints apply when these locks interact with hardware interrupts.
Locks and Interrupt Handlers
Spinlocks frequently protect shared data required by both standard kernel threads and asynchronous hardware interrupt handlers.
If an interrupt handler executes on a CPU already holding a lock required by that handler, the specific CPU deadlocks permanently because the interrupted thread cannot resume to release it.
CPUs must explicitly disable local hardware interrupts immediately prior to acquiring any spinlock to prevent this single-CPU deadlock scenario.
Nested critical sections require rigorous state tracking using push_off and pop_off.
push_off increments a lock nesting count and disables interrupts.
pop_off decrements the count, restoring the original interrupt state only when the nesting count hits zero.
Execution order is strictly enforced: push_off must execute before modifying the lock state, and pop_off must execute only after the lock is fully released.
Hardware interrupts represent one class of unexpected execution re-ordering that threatens locks; CPU architecture memory models represent another.
Instruction and Memory Ordering
Modern CPUs and compilers frequently re-order sequential instructions to maximize throughput and prevent processor stalling.
Memory model rules preserve the correctness of serial execution but permit re-ordering that actively breaks concurrent code logic.
Memory barriers, such as __sync_synchronize(), explicitly instruct the compiler and hardware to halt the re-ordering of load or store instructions across the barrier boundary.
acquire and release functions embed memory barriers to guarantee that modifications inside a critical section become strictly visible to other CPUs before the lock is released.
Spinlocks with memory barriers are effective for instantaneous operations, but lengthy physical operations require a fundamentally different locking architecture.
Sleep-locks
Spinlocks are entirely unsuitable for long-duration operations, such as disk I/O, because waiting threads waste substantial CPU cycles continuously spinning.
A thread is forbidden from yielding the CPU while holding a spinlock, as yielding invites deadlock and violates the strict disabled-interrupt requirement.
Sleep-locks resolve this architectural limitation by automatically yielding the CPU to other processes while waiting for acquisition.
The internal state of a sleep-lock is guarded by a standard spinlock; the acquiring process atomically yields the CPU and drops the internal spinlock simultaneously.
Functional properties of sleep-locks:
Permit hardware interrupts while held.
Permit thread yielding while held.
Strictly prohibited within interrupt handlers.
Strictly prohibited within spinlock critical sections.
The dual abstractions of hardware-backed spinlocks and yielding sleep-locks form the foundation for managing shared memory configurations in real-world environments.
Real World Implementation Variables
Modern operating systems expose advanced concurrency controls to user space applications via POSIX threads (Pthreads).
High-contention locks severely degrade multiprocessor performance through cache line bouncing, as atomic instructions force data transfer between isolated hardware caches.
Lock-free data structures eliminate direct lock overhead but require highly complex manual management of instruction and memory re-ordering constraints.