Concurrency Control and Optimization

Explicit Locking Patterns

System resources require synchronized access, but varying hardware states and object lifecycles demand specialized locking geometries.

Dual-Lock Cache Architecture: Data structures that act as caches (like the block cache) must protect both the collection of items and the items themselves. A global lock (bcache.lock) enforces mutual exclusion over the identity and presence of cached blocks, while a per-item lock (struct buf lock) guarantees exclusive access to a specific disk block’s payload once it is isolated.
Asymmetric Lock Boundaries: Locks guarantee atomicity for a sequence of operations, but the acquisition and release of a lock do not need to occur within the same function, thread, or CPU. For example, yield acquires a process lock that is later released by the scheduler thread. Similarly, ilock acquires a sleep-lock that may block for disk I/O and subsequently wake up and release the lock on an entirely different CPU.
Safe Object Deallocation: Embedded locks are insufficient for protecting their own destruction. If one thread waits to acquire an object’s embedded lock while another thread frees that object, the waiting thread will crash or malfunction.
Reference Count Defenses: To safely destroy locked objects, systems use reference counts (e.g., pi->readopen and pi->writeopen in pipes) to track active consumers, ensuring the underlying memory and embedded lock are only deallocated when the final reference disappears.

While explicit locking defines strict atomic sequences, dynamic object lifecycles and complex traversals often necessitate lock-like patterns that rely on system state rather than discrete lock variables.

Lock-Like Synchronization Patterns

Data constraints can be enforced through structural ownership and system flags rather than conventional spinlocks or sleep-locks.

State Flags as Locks: System state variables and reference counts (such as p->state, or the reference counts in file, inode, and buf structures) act as implicit locks that prevent active objects from being prematurely freed or reassigned.
Shared Locks via Reference Counts: inode reference counts function as shared locks that can be held by multiple processes to prevent deadlocks during hierarchical traversal. When parsing pathnames (namex), traversing a . or .. directory while holding absolute locks would cause the thread to deadlock with itself or other concurrent lookups. Incrementing the reference count securely carries the directory inode to the next loop iteration without retaining the exclusive lock.
Lifecycle-Based Implicit Protection: A single data object shifts its concurrency protection model dynamically throughout its lifespan.
- When free, a physical page is guarded by a global allocator lock (kmem.lock).
- When allocated as an active pipe, it is protected by an embedded lock (pi->lock).
- When assigned as a process’s user memory, it operates without any lock, shielded entirely by the implicit guarantee of exclusive process ownership.
Interrupt Disabling: Reading CPU-specific structures (mycpu()) relies on hardware-level interrupt disabling for synchronization. This renders the code block atomic against timer interrupts, preventing a context switch from silently migrating the process to a different CPU mid-execution.

Although state flags and lifecycle phases provide implicit protection, certain hardware mechanisms and structural guarantees allow concurrent execution with no locks at all.

Lock-Free Concurrency

Specific memory access patterns eliminate the need for synchronization primitives by leveraging hardware semantics and sequence barriers.

Hardware-Level Atomicity: The foundational spinlock implementation itself shares mutable data without external locks, relying entirely on atomic CPU instructions.
Volatile Memory Directives: Boot sequencing coordinates CPUs without locks using the started variable; the volatile keyword guarantees the compiler generates exact load and store instructions to prevent secondary CPUs from executing before primary initialization finishes.
Asymmetric Lock-Free Access: Variables like the process termination flag (p->killed) are strictly written while holding a lock but are continuously read lock-free to rapidly detect state changes without stalling execution.
Implicit Memory Ordering: Threads can safely share mutable data sequentially if isolated by memory barriers. During process creation (fork), a parent writes child memory pages without dedicated data locks. The memory barriers invoked by the release of the parent’s locks and the acquire of the child’s locks synchronize CPU caches, guaranteeing the child correctly observes the parent’s initial writes.

The strategic removal of locks through hardware atomics and structural ordering directly drives the system’s capacity for parallel execution, forcing strict trade-offs between correctness and performance.

Parallelism and Performance Trade-offs

System throughput scales inversely with lock contention, requiring architectural compromises between maximal parallel execution and system safety.

Concurrency limits Parallelism: While locking is mandatory for correctness, it strictly suppresses parallelism. Designing for performance requires evaluating lock granularity against the likelihood of simultaneous CPU execution.
High-Parallelism Structures: Pipes isolate lock domains effectively. Because each pipe retains an independent lock, unrelated processes can read and write to distinct pipes simultaneously across different CPUs. Execution only serializes when a writer and reader interact with the exact same pipe.
Bottlenecks in Core Operations:
- Context Switching: Multiple CPUs execute thread teardowns (yield, sched, and swtch) in perfect parallel using localized locks. However, execution stalls inside the scheduler thread, where cores conflict over global locks to search the unified process table for RUNNABLE entities.
- Process Creation: Concurrent fork requests successfully duplicate user memory pages and build page-tables in parallel. Contention bottlenecks emerge when processes queue for pid_lock, kmem.lock, and exclusive access to search the process table for UNUSED slots.
Optimization Analysis: Refining a locking architecture relies heavily on empirical measurement. Modifying core designs requires assessing operation frequency, critical section duration, core counts, and whether adjacent constraints negate the benefits of reduced lock contention.

My Knowledge Base

Explorer

09 Concurrency revisited

Concurrency Control and Optimization

Explicit Locking Patterns

Lock-Like Synchronization Patterns

Lock-Free Concurrency

Parallelism and Performance Trade-offs