Bottom Halves and Deferring Work
Interrupt processing relies on a strict division of labor to manage the inherent hardware constraints of operating systems. The top half, implemented as the interrupt handler, executes asynchronously, immediately acknowledges the hardware, and performs time-critical tasks,,. Because top halves run with local interrupt lines disabled and execute entirely in interrupt context, they block communication with other hardware and cannot sleep,. To minimize system latency and return processor control to interrupted code, all non-time-critical processing must be deferred,. The bottom half executes this deferred work at a later, more convenient time when all hardware interrupts are re-enabled,.
To manage this deferred execution efficiently across various system architectures, the kernel provides multiple bottom-half mechanisms governed by distinct performance and context constraints.
Evolution of Bottom-Half Mechanisms
The infrastructure for deferring work has evolved significantly to handle symmetrical multiprocessing (SMP) and scalability requirements:
- Original BH Interface: The initial bottom-half implementation relied on a statically defined list of routines. Handlers were globally synchronized, meaning no two BH handlers could execute concurrently anywhere in the system, creating a severe performance bottleneck on SMP machines,.
- Task Queues: Designed as an aggregate of linked lists of functions, task queues ran at specific points in the kernel,. The interface lacked flexibility and failed to provide the lightweight overhead required by high-performance subsystems like networking,.
- Modern Implementations: Kernel 2.5 deprecated both BH and task queues, replacing them entirely with softirqs, tasklets, and work queues,.
The foundation of the modern deferred execution model relies on the highly scalable softirq subsystem.
Softirqs
Softirqs are statically allocated, high-performance bottom halves designed for heavily threaded subsystems,.
- Structure and Allocation: Softirqs are represented by
struct softirq_action, which contains a singleactionfunction pointer. They are statically allocated at compile time via an enumeration index, with lower numerical indices (e.g.,HI_SOFTIRQ,NET_TX_SOFTIRQ) dictating higher execution priority,. The system enforces a hard limit of registered softirqs. - Execution Rules: Softirqs execute in interrupt context with hardware interrupts enabled,. They cannot sleep or block. Softirqs never preempt other softirqs; they are only preempted by top-half interrupt handlers.
- Concurrency: Multiple instances of the exact same softirq can execute concurrently on different processors,. This necessitates aggressive, highly tuned locking protocols or the strict use of per-processor data,.
- Triggering: Handlers are registered using
open_softirq(). A softirq is marked for execution (raised) viaraise_softirq(), which saves the interrupt state, disables local interrupts, flags the softirq, and restores the interrupt state,. - Processing: Softirq execution is processed by
__do_softirq(), which retrieves a 32-bit bitmask of pending softirqs vialocal_softirq_pending(),. It clears the active mask and loops through the bits, invoking theactionhandler for every set bit until the mask evaluates to ,,.
While softirqs offer maximum scalability through concurrent execution, managing their strict locking requirements led to the development of a simpler derivative mechanism known as tasklets.
Tasklets
Tasklets are dynamically created bottom halves built directly on top of the softirq infrastructure,. They trade extreme scalability for a simpler concurrency model,.
- Base Mechanism: Tasklets are multiplexed onto two specific softirqs:
HI_SOFTIRQ(high priority) andTASKLET_SOFTIRQ(normal priority),. - Data Structure: Represented by
struct tasklet_struct, which tracks thenextpointer in the list, the tasklet’sstate, an atomiccountfor references, thefunchandler, and itsdataargument. - State Management: The
statefield utilizesTASKLET_STATE_SCHEDto indicate the tasklet is pending, andTASKLET_STATE_RUNto indicate it is actively executing. Thecountfield operates as a reference counter; a tasklet is only eligible to execute ifcountequals ,. - Concurrency Rules: Two identical tasklets strictly cannot execute concurrently on different processors,. However, two entirely different tasklets can run simultaneously on distinct processors,.
- Scheduling (
tasklet_schedule()):- Verifies
TASKLET_STATE_SCHEDis not already set. - Disables local interrupts to protect list manipulation.
- Appends the tasklet to the head of the per-processor
tasklet_vecortasklet_hi_veclinked lists. - Raises the underlying softirq and restores interrupts.
- Verifies
- Execution (
tasklet_action()):- Clears the local processor’s tasklet list.
- Iterates over the pending tasklets, skipping any where
TASKLET_STATE_RUNis already set globally. - Sets
TASKLET_STATE_RUN, executes the handler, and clears the run state upon completion.
- Interfaces: Tasklets are declared statically via
DECLARE_TASKLET()or dynamically viatasklet_init(),. They can be disabled synchronously or asynchronously viatasklet_disable()andtasklet_disable_nosync(), and removed from queues usingtasklet_kill(),.
Because softirqs and tasklets can continuously reactivate themselves during heavy workloads, the kernel employs dedicated threads to prevent these mechanisms from monopolizing processor time.
ksoftirqd
The ksoftirqd subsystem consists of per-processor kernel threads designed to handle high-volume softirq processing.
- Starvation Prevention: During intense periods (e.g., heavy network traffic), softirqs can continuously reactivate themselves. Processing them indefinitely starves user-space applications, while deferring them strictly to the next hardware interrupt severely degrades throughput,.
- Execution Model: The kernel defers reactivated softirqs to the
ksoftirqd/nthreads (where is the processor ID), which run at the lowest possible scheduler priority (nice value ),. This ensures user-space tasks preempt the softirq processing, while guaranteeing that softirqs still execute promptly on idle processors. - Thread Loop: The thread awakens when
do_softirq()detects self-reactivating softirqs. It runs in a tight loop checkingsoftirq_pending(), invokingdo_softirq(), and yielding the processor viaschedule()wheneverneed_resched()is flagged,.
Softirqs, tasklets, and their daemon threads operate strictly in interrupt context; workloads requiring sleep or blocking I/O must shift to process-context mechanisms.
Work Queues
Work queues defer execution to dedicated kernel threads (worker threads), enabling the bottom half to run entirely in process context.
- Context Capabilities: Because they operate in process context, work queues can block, sleep, allocate heavy memory, and perform synchronous disk I/O,.
- Thread Topologies: By default, the subsystem utilizes the generic
events/nworker threads (one per processor). Drivers with intensive, specialized processing demands can instantiate dedicated custom worker threads viacreate_workqueue(),. - Data Structures:
struct workqueue_struct: Represents all worker threads of a specific type across the global system.struct cpu_workqueue_struct: Represents a single worker thread bound to a specific physical processor. Contains theworklistand the wait queue.struct work_struct: Represents the specific deferred task. It contains thefunchandler pointer and thedatapayload.
- Worker Thread Loop (
worker_thread()):- The thread adds itself to a wait queue and enters a
TASK_INTERRUPTIBLEstate. - If the
worklistis empty, it invokesschedule()and sleeps. - If work is queued, it updates to
TASK_RUNNING, iterates over the linked list ofwork_structobjects, clears their pending bits, and executes each associatedfunc,.
- The thread adds itself to a wait queue and enters a
- Interfaces: Tasks are defined via
DECLARE_WORK()orINIT_WORK(). They are dispatched usingschedule_work()orschedule_delayed_work(),. Subsystems can force synchronous completion of all queued tasks usingflush_scheduled_work().
Choosing the correct mechanism and synchronization schema depends entirely on the context constraints and scalability requirements of the deferred work.
Mechanism Selection and Synchronization
Designing a bottom-half architecture requires mapping subsystem constraints to the appropriate deferral interface and implementing robust concurrency locks.
- Selection Guidelines:
- Work Queues: Mandatory if the deferred work must sleep, block, or allocate significant memory,.
- Tasklets: The default choice for most hardware device drivers. They eliminate the need for complex intra-handler locking by guaranteeing identical tasklets never execute concurrently,.
- Softirqs: Reserved strictly for highly scalable, timing-critical subsystems (like networking) where instances of the same handler must run concurrently across multiple processors, necessitating careful per-processor data management,.
- Concurrency and Locking:
- Intra-Tasklet Safety: Tasklets are inherently serialized against themselves, eliminating intra-tasklet race conditions. However, accessing data shared between two different tasklets requires standard spin locks.
- Softirq Locking: Because softirqs execute concurrently without serialization, any shared global data requires rigorous locking,.
- Process Context Sharing: When process context code shares data with a bottom half, the process code must obtain a lock and explicitly disable bottom-half processing to prevent preemption and deadlocks.
- Interrupt Context Sharing: When an interrupt handler shares data with a bottom half, the handler must obtain a lock and disable local interrupts entirely.
- Disabling Bottom Halves:
local_bh_disable(): Disables all softirq and tasklet processing on the local processor by incrementing the task’spreempt_count,.local_bh_enable(): Decrements thepreempt_count. If the count evaluates to zero, it actively checks for and executes any bottom halves that became pending during the disabled period,. Work queues are unaffected by these calls as they operate asynchronously via the standard process scheduler.