(STILL NEED TO DO LIGHT READING ON THIS)

Kernel Synchronization

Fundamentals of Concurrent Access

Shared resources within the kernel require strict protection from concurrent access to prevent data overwriting and inconsistent states. In modern kernels, code can execute simultaneously on multiple processors due to symmetrical multiprocessing (SMP), or it can be involuntarily suspended and rescheduled due to kernel preemption. Without protection, code running on different processors or preempted threads can simultaneously access and manipulate the exact same shared data.

To understand how concurrent accesses cause damage to shared data, the concepts of critical regions and race conditions must be defined.

Critical Regions and Race Conditions

Critical Region (Critical Section): A code path that accesses and manipulates shared data.
Atomicity: The requirement that operations within a critical region complete without interruption, executing as if the entire region were one indivisible instruction.
Race Condition: A system flaw where two or more threads of execution simultaneously execute within the same critical region, creating a “race” to modify the data.
Synchronization: The programmatic prevention of unsafe concurrency and race conditions.

The Single Variable Increment The simplest critical region is the increment of a single variable, such as $i++$ . At the processor level, this translates to three distinct instructions:

Get the current value of $i$ and copy it into a register.
Add one to the value stored in the register.
Write the new value back to memory.

If two threads execute this operation simultaneously when $i = 7$ , both may read 7, increment to 8, and write back 8. The desired outcome of 9 is lost because the operations interleaved. Hardware architectures solve this specific race condition by providing atomic instructions that read, increment, and write back a single variable in one indivisible step.

While atomic instructions successfully protect single isolated variables, complex data structures spanning multiple operations require a broader synchronization mechanism.

Locking

A lock provides a mechanism to prevent access to a resource while another thread of execution is in the marked critical region.

Advisory Nature: Locks are entirely voluntary programming constructs. Code must explicitly acquire the appropriate lock before accessing shared data and release it when finished.
Mutual Exclusion: A lock can be held by at most one thread at a time.
Wait States: If a thread attempts to acquire an unavailable lock, it must wait—either by busy-looping (spinning) or by sleeping—until the lock is released.
Implementation: Locks are built upon architecture-specific atomic instructions, such as compare and exchange or test and set, ensuring that the act of acquiring the lock is itself free from race conditions.

Understanding when and where to deploy locks requires identifying the specific sources of concurrency that threaten kernel stability.

Causes of Concurrency

Concurrency manifests in two forms: pseudo-concurrency (tasks interleaving on a single processor) and true concurrency (tasks executing simultaneously on multiple processors). The kernel must synchronize against several distinct sources of both:

Interrupts: Can occur asynchronously at almost any time, interrupting the currently executing code.
Softirqs and tasklets: Can be raised or scheduled dynamically, interrupting currently executing code.
Kernel preemption: The scheduler can preempt one task in the kernel to execute another.
Sleeping and user-space synchronization: A task in the kernel can block (sleep), invoking the scheduler and transferring execution to a new process.
Symmetrical Multiprocessing (SMP): Two or more processors can execute kernel code at the exact same time.

Kernel code is classified by its resilience against these concurrency sources:

Interrupt-safe: Safe from concurrent access by an interrupt handler.
SMP-safe: Safe from true concurrency on SMP machines.
Preempt-safe: Safe from pseudo-concurrency caused by kernel preemption.

Knowing the exact causes of concurrency dictates which data actually requires locking and which data is naturally isolated.

Knowing What to Protect

Synchronization efforts must protect data, not code. Any data that can be accessed concurrently almost certainly needs protection.

Local automatic variables: Do not require locking. They exist solely on the stack of the executing thread and cannot be accessed concurrently.
Task-specific data: Data accessed by only a specific task does not require locking because a single process can only execute on one processor at a time.
Global kernel data: Most global data structures require locks. If another thread of execution can see the data, it must be locked.

Configuration Options The Linux kernel tailors its locking mechanisms at compile time based on system capabilities:

CONFIG_SMP: Enables symmetrical multiprocessing support. If unset (uniprocessor machines), the overhead of physical spin locks is compiled away, leaving only markers that disable kernel preemption.
CONFIG_PREEMPT: Enables kernel preemption.
Pessimistic Design: Kernel code should always be written for the most pessimistic scenario—an SMP system with kernel preemption enabled.

Designing a robust locking scheme must account for scenarios where acquiring the locks themselves halts system execution, a fatal condition known as a deadlock.

Deadlocks

A deadlock is a condition involving one or more threads and one or more resources, where each thread waits for a resource held by another, preventing any thread from making progress.

Self-Deadlock: A thread attempts to acquire a lock it already holds. Because Linux does not provide recursive locks, the thread waits indefinitely for itself to release the lock.
The Deadly Embrace (ABBA Deadlock): Thread 1 holds Lock A and waits for Lock B. Simultaneously, Thread 2 holds Lock B and waits for Lock A. Neither will ever release their initial lock.

Deadlock Prevention Rules

Implement lock ordering: Nested locks must always be obtained in the exact same sequence across all functions. This unconditionally prevents the ABBA deadlock.
Prevent starvation: Ensure critical regions always finish.
Do not double-acquire: Never attempt to acquire the exact same lock twice.
Design for simplicity: Complex locking schemes obscure data dependencies and invite deadlocks.

While avoiding deadlocks guarantees functional correctness, the efficiency of the locking scheme is dictated by lock contention and scalability.

Contention and Scalability

Lock contention describes a state where a lock is currently held and another thread is waiting to acquire it. High contention acts as a system bottleneck and occurs when a lock is frequently requested, held for long durations, or both.

Scalability measures how well a system expands to support an increase in processors, processes, or memory.

Lock Granularity: Describes the amount of data a single lock protects.
Coarse-Grained Locking: A single lock protects a massive amount of data (e.g., an entire subsystem). This scales poorly on large SMP machines due to high contention.
Fine-Grained Locking: A lock protects a tiny amount of data (e.g., a single node in a linked list). This scales excellently on large SMP machines but introduces wasteful overhead and complexity on uniprocessor or dual-processor systems where contention is naturally low.

Proper synchronization demands identifying critical regions, avoiding deadlocks, and striking the exact balance between coarse and fine-grained lock scalability to implement the most optimal kernel synchronization primitives.

Kernel Synchronization Methods

The Linux kernel’s synchronization methods enable developers to write efficient and race-free code. The declarations needed to use the atomic integer operations are in <asm/atomic.h>

The declarations needed to use the atomic integer operations are in <asm/atomic.h>

Atomic Operations

Atomic operations execute instructions without interruption, ensuring that the operation completes in its entirety or not at all. They form the foundation for all other synchronization methods.

Atomic Integer Operations:
- Operate on the atomic_t (32-bit) and atomic64_t (64-bit) data types.
- Using specialized types prevents the compiler from erroneously optimizing access and ensures atomic functions are not used on standard C integers.
- Commonly used to implement lightweight counters and perform test-and-set operations, such as atomic_dec_and_test().
- Provide atomicity (uninterrupted execution) but do not inherently provide ordering (the relative sequence of multiple instructions).
Atomic Bitwise Operations:
n addition to atomic integer operations, the kernel also provides a family of functions that operate at the bit level. Not surprisingly, they are architecture-specific and defined in <asm/bitops.h>
- Operate on generic memory addresses via pointers, specifying a target bit number.
- Guarantee that intermediate states are correctly realized, preventing simultaneous set and clear operations from failing.
- Non-atomic variants (prefixed with __) are available for environments already protected by locks.
- Unlike the atomic integer operations, code typically has no choice whether to use the bitwise operations—they are the only portable way to set a specific bit.The only question is whether to use the atomic or nonatomic variants. If your code is inherently safe from race conditions, you can use the nonatomic versions, which might be faster depending on the architecture.

While atomic operations handle simple variables seamlessly, complex data structures spanning multiple operations require broader synchronization mechanisms like spin locks.

Spin Locks

A spin lock is a lightweight locking mechanism that can be held by at most one thread of execution.

If a thread of execution attempts to ac- quire a spin lock while it is already held, which is called contended, the thread busy loops— spins—waiting for the lock to become available;

he architecture- dependent code is defined in <asm/spinlock.h>.The actual usable interfaces are defined in <linux/spinlock.h>.T

Busy Waiting: If a thread attempts to acquire a contended lock, it loops (spins) while repeatedly checking for lock availability.
Duration Constraints: Because spinning wastes processor cycles, spin locks must be held for durations shorter than two context switches.
Non-Recursive: Linux spin locks are not recursive; attempting to acquire a lock already held by the same thread results in a self-deadlock.
Interrupt Context Usage:

spin_lock(&mr_lock); /* critical region ... */ spin_unlock(&mr_lock);

Spin locks are safe for interrupt handlers because they do not sleep.
When used in an interrupt handler, the local processor’s interrupts must be disabled (via spin_lock_irqsave()) to prevent an interrupt from preempting the lock holder and double-acquiring the lock.
Bottom Half Safety: spin_lock_bh() disables all bottom halves while acquiring the lock to protect data shared between process context and bottom halves.

It is important that each lock is clearly associated with what it is locking. More important, you should protect data and not code. Despite the examples in this chapter explaining the importance of protecting the critical sections, it is the actual data inside that needs protec- tion and not the code. Big Fat Rule: Locks that simply wrap code regions are hard to understand and prone to race conditions. Lock data, not code

Spin locks strictly limit access to a single thread, but certain workloads naturally separate into read-only and write-only patterns that can be optimized with specialized variants.

Reader-Writer Spin Locks

Reader-writer spin locks split lock usage into shared and exclusive access patterns.

Concurrent Readers: One or more readers can concurrently hold the reader variant of the lock (read_lock()).
Exclusive Writers: The writer variant (write_lock()) can be held by at most one writer, requiring zero concurrent readers.
No Upgrades: A read lock cannot be upgraded to a write lock; attempting to acquire a write lock while holding a read lock results in a deadlock.
Writer Starvation: The implementation favors readers over writers. A continuous stream of readers will starve pending writers until all readers release the lock.

Spin locks, including their reader-writer variants, waste processor cycles while spinning, making them unsuitable for code that must sleep or locks held over long durations.

Semaphores

Semaphores are sleeping locks that place contending tasks onto a wait queue and yield the processor.

Performance Characteristics: They provide better processor utilization than spin locks during contention but incur significant context switching overhead.
Usage Constraints:
- Ideal for locks held for long periods.
- Cannot be used in interrupt context because they invoke the scheduler.
- Do not disable kernel preemption, meaning code holding a semaphore can be preempted.
Usage Counts:
- Binary Semaphore (Mutex): Initialized with a count of one, enforcing strict mutual exclusion.
- Counting Semaphore: Initialized with a count greater than one, permitting multiple concurrent threads up to the specified limit.
Operations:
- down_interruptible(): Decrements the count to acquire the lock; enters interruptible sleep if contended, allowing the task to wake upon receiving a signal.
- up(): Increments the count to release the lock and wakes a waiting task.

Semaphores support complex counting and sleeping requirements, but environments demanding strict read/write boundaries benefit from targeted reader-writer semaphores.

Reader-Writer Semaphores

Reader-writer semaphores (struct rw_semaphore) provide sleeping locks divided by access patterns.

Access Rules: Multiple readers can hold the lock concurrently, but writers demand absolute exclusivity.
Uninterruptible Sleep: All reader-writer semaphores use uninterruptible sleep mechanisms.
Downgrading: The downgrade_write() function allows an acquired write lock to be atomically converted to a read lock.

While semaphores are highly flexible, their generic nature lacks strict constraint enforcement, leading the kernel to introduce a simpler, regimented sleeping lock.

Mutexes

The kernel struct mutex provides a simpler, more efficient sleeping lock dedicated exclusively to mutual exclusion.

Strict Constraints:
- The usage count is rigidly fixed to one.
- The context that locked the mutex is the only context permitted to unlock it.
- Recursive locks and unlocks are forbidden.
- Tasks cannot exit while holding a mutex.
- Unusable in interrupt handlers or bottom halves, even via mutex_trylock().
Debugging: Enabling CONFIG_DEBUG_MUTEXES allows the kernel to programmatically verify and enforce these constraints.
Selection: Mutexes are strongly preferred over semaphores for all mutual exclusion needs unless a specific constraint prevents their use.

Mutexes provide robust mutual exclusion for shared data, but occasionally tasks merely need to signal to one another that a specific event has completed.

Completion Variables

Completion variables (struct completion) synchronize two tasks when one task must wait for another to complete an action.

Mechanism: One task calls wait_for_completion() to block until the event occurs.
Signaling: The executing task calls complete() to wake all tasks waiting on the completion variable.
Application: Frequently used dynamically within data structures to pause execution until structure initialization finishes.

Modern synchronization relies on precise, localized locks like mutexes and completions, but early multiprocessor support relied on a single, massive lock for the entire kernel.

BKL: The Big Kernel Lock

The Big Kernel Lock (BKL) is a deprecated, global spin lock implemented to ease Linux’s transition to fine-grained symmetrical multiprocessing.

Properties:
- Allows tasks to safely sleep while held; the lock is transparently dropped upon unscheduling and reacquired upon rescheduling.
- Operates as a recursive lock.
- Restricted exclusively to process context.
Deprecation: Due to severe scalability bottlenecks, new code must never use the BKL.

As coarse locks like the BKL were phased out, highly specialized locks emerged for specific, high-contention access patterns, such as the sequential lock.

Sequential Locks

The sequential lock (seqlock_t) provides lightweight data protection utilizing a sequence counter.

Write Operations: Obtaining the write lock increments the sequence counter, making it odd. Releasing the lock increments it again, making it even.
Read Operations: Readers read the sequence number before and after accessing data. If the numbers match and are even, the read was uninterrupted by a write.
Writer Priority: Seq locks strictly favor writers; readers never block pending writers, but writers force readers to repeat their read loops until no writers hold the lock.
Ideal Workloads: Used when data has many readers, few writers, cannot be made atomic, and writers must never be starved (e.g., the jiffies_64 uptime variable).

Explicit locking protects data across multiple processors, but when data is strictly local to a single processor, it only requires protection from local task preemption.

Preemption Disabling

Kernel preemption allows a task to be involuntarily paused so a higher-priority task can run, creating pseudo-concurrency issues on uniprocessor systems.

Preemption Counters: The kernel tracks held locks and disabled regions via a preemption counter; preemption is disabled if the value is greater than zero.
Disabling/Enabling: preempt_disable() and preempt_enable() establish non-preemptive regions without requiring spin locks.
Per-Processor Data: get_cpu() safely retrieves the current processor number while simultaneously disabling kernel preemption to ensure the task is not moved to another processor mid-operation.

Preventing preemption and securing concurrent access ensures logical data safety, but hardware and compilers can still physically reorder instructions, violating expected memory operation sequences.

Ordering and Barriers

Compilers and modern processors optimize pipelines by dispatching and committing memory loads (reads) and stores (writes) out of order.

Memory Barriers: Enforce strict hardware instruction ordering.
- rmb(): Read memory barrier; prevents loads from being reordered across the barrier.
- wmb(): Write memory barrier; prevents stores from being reordered across the barrier.
- mb(): General memory barrier; prevents both loads and stores from reordering across the barrier.
- read_barrier_depends(): Enforces a read barrier strictly for loads that possess data dependencies.
Compiler Barriers: barrier() prevents the compiler from optimizing or rearranging loads and stores across the call, though it does not dictate hardware processor behavior.
SMP Variants: Macros like smp_mb() resolve to standard memory barriers on SMP kernels but degrade to simple compiler barriers on uniprocessor kernels.

My Knowledge Base

Explorer

06 Kernel Synchronization