Bottom Halves and Deferring Work

Interrupt processing relies on a strict division of labor to manage the inherent hardware constraints of operating systems. The top half, implemented as the interrupt handler, executes asynchronously, immediately acknowledges the hardware, and performs time-critical tasks,,. Because top halves run with local interrupt lines disabled and execute entirely in interrupt context, they block communication with other hardware and cannot sleep,. To minimize system latency and return processor control to interrupted code, all non-time-critical processing must be deferred,. The bottom half executes this deferred work at a later, more convenient time when all hardware interrupts are re-enabled,.

To manage this deferred execution efficiently across various system architectures, the kernel provides multiple bottom-half mechanisms governed by distinct performance and context constraints.

Evolution of Bottom-Half Mechanisms

The infrastructure for deferring work has evolved significantly to handle symmetrical multiprocessing (SMP) and scalability requirements:

  • Original BH Interface: The initial bottom-half implementation relied on a statically defined list of routines. Handlers were globally synchronized, meaning no two BH handlers could execute concurrently anywhere in the system, creating a severe performance bottleneck on SMP machines,.
  • Task Queues: Designed as an aggregate of linked lists of functions, task queues ran at specific points in the kernel,. The interface lacked flexibility and failed to provide the lightweight overhead required by high-performance subsystems like networking,.
  • Modern Implementations: Kernel 2.5 deprecated both BH and task queues, replacing them entirely with softirqs, tasklets, and work queues,.

The foundation of the modern deferred execution model relies on the highly scalable softirq subsystem.

Softirqs

Softirqs are statically allocated, high-performance bottom halves designed for heavily threaded subsystems,.

  • Structure and Allocation: Softirqs are represented by struct softirq_action, which contains a single action function pointer. They are statically allocated at compile time via an enumeration index, with lower numerical indices (e.g., HI_SOFTIRQ, NET_TX_SOFTIRQ) dictating higher execution priority,. The system enforces a hard limit of registered softirqs.
  • Execution Rules: Softirqs execute in interrupt context with hardware interrupts enabled,. They cannot sleep or block. Softirqs never preempt other softirqs; they are only preempted by top-half interrupt handlers.
  • Concurrency: Multiple instances of the exact same softirq can execute concurrently on different processors,. This necessitates aggressive, highly tuned locking protocols or the strict use of per-processor data,.
  • Triggering: Handlers are registered using open_softirq(). A softirq is marked for execution (raised) via raise_softirq(), which saves the interrupt state, disables local interrupts, flags the softirq, and restores the interrupt state,.
  • Processing: Softirq execution is processed by __do_softirq(), which retrieves a 32-bit bitmask of pending softirqs via local_softirq_pending(),. It clears the active mask and loops through the bits, invoking the action handler for every set bit until the mask evaluates to ,,.

While softirqs offer maximum scalability through concurrent execution, managing their strict locking requirements led to the development of a simpler derivative mechanism known as tasklets.

Tasklets

Tasklets are dynamically created bottom halves built directly on top of the softirq infrastructure,. They trade extreme scalability for a simpler concurrency model,.

  • Base Mechanism: Tasklets are multiplexed onto two specific softirqs: HI_SOFTIRQ (high priority) and TASKLET_SOFTIRQ (normal priority),.
  • Data Structure: Represented by struct tasklet_struct, which tracks the next pointer in the list, the tasklet’s state, an atomic count for references, the func handler, and its data argument.
  • State Management: The state field utilizes TASKLET_STATE_SCHED to indicate the tasklet is pending, and TASKLET_STATE_RUN to indicate it is actively executing. The count field operates as a reference counter; a tasklet is only eligible to execute if count equals ,.
  • Concurrency Rules: Two identical tasklets strictly cannot execute concurrently on different processors,. However, two entirely different tasklets can run simultaneously on distinct processors,.
  • Scheduling (tasklet_schedule()):
    • Verifies TASKLET_STATE_SCHED is not already set.
    • Disables local interrupts to protect list manipulation.
    • Appends the tasklet to the head of the per-processor tasklet_vec or tasklet_hi_vec linked lists.
    • Raises the underlying softirq and restores interrupts.
  • Execution (tasklet_action()):
    • Clears the local processor’s tasklet list.
    • Iterates over the pending tasklets, skipping any where TASKLET_STATE_RUN is already set globally.
    • Sets TASKLET_STATE_RUN, executes the handler, and clears the run state upon completion.
  • Interfaces: Tasklets are declared statically via DECLARE_TASKLET() or dynamically via tasklet_init(),. They can be disabled synchronously or asynchronously via tasklet_disable() and tasklet_disable_nosync(), and removed from queues using tasklet_kill(),.

Because softirqs and tasklets can continuously reactivate themselves during heavy workloads, the kernel employs dedicated threads to prevent these mechanisms from monopolizing processor time.

ksoftirqd

The ksoftirqd subsystem consists of per-processor kernel threads designed to handle high-volume softirq processing.

  • Starvation Prevention: During intense periods (e.g., heavy network traffic), softirqs can continuously reactivate themselves. Processing them indefinitely starves user-space applications, while deferring them strictly to the next hardware interrupt severely degrades throughput,.
  • Execution Model: The kernel defers reactivated softirqs to the ksoftirqd/n threads (where is the processor ID), which run at the lowest possible scheduler priority (nice value ),. This ensures user-space tasks preempt the softirq processing, while guaranteeing that softirqs still execute promptly on idle processors.
  • Thread Loop: The thread awakens when do_softirq() detects self-reactivating softirqs. It runs in a tight loop checking softirq_pending(), invoking do_softirq(), and yielding the processor via schedule() whenever need_resched() is flagged,.

Softirqs, tasklets, and their daemon threads operate strictly in interrupt context; workloads requiring sleep or blocking I/O must shift to process-context mechanisms.

Work Queues

Work queues defer execution to dedicated kernel threads (worker threads), enabling the bottom half to run entirely in process context.

  • Context Capabilities: Because they operate in process context, work queues can block, sleep, allocate heavy memory, and perform synchronous disk I/O,.
  • Thread Topologies: By default, the subsystem utilizes the generic events/n worker threads (one per processor). Drivers with intensive, specialized processing demands can instantiate dedicated custom worker threads via create_workqueue(),.
  • Data Structures:
    • struct workqueue_struct: Represents all worker threads of a specific type across the global system.
    • struct cpu_workqueue_struct: Represents a single worker thread bound to a specific physical processor. Contains the worklist and the wait queue.
    • struct work_struct: Represents the specific deferred task. It contains the func handler pointer and the data payload.
  • Worker Thread Loop (worker_thread()):
    • The thread adds itself to a wait queue and enters a TASK_INTERRUPTIBLE state.
    • If the worklist is empty, it invokes schedule() and sleeps.
    • If work is queued, it updates to TASK_RUNNING, iterates over the linked list of work_struct objects, clears their pending bits, and executes each associated func,.
  • Interfaces: Tasks are defined via DECLARE_WORK() or INIT_WORK(). They are dispatched using schedule_work() or schedule_delayed_work(),. Subsystems can force synchronous completion of all queued tasks using flush_scheduled_work().

Choosing the correct mechanism and synchronization schema depends entirely on the context constraints and scalability requirements of the deferred work.

Mechanism Selection and Synchronization

Designing a bottom-half architecture requires mapping subsystem constraints to the appropriate deferral interface and implementing robust concurrency locks.

  • Selection Guidelines:
    • Work Queues: Mandatory if the deferred work must sleep, block, or allocate significant memory,.
    • Tasklets: The default choice for most hardware device drivers. They eliminate the need for complex intra-handler locking by guaranteeing identical tasklets never execute concurrently,.
    • Softirqs: Reserved strictly for highly scalable, timing-critical subsystems (like networking) where instances of the same handler must run concurrently across multiple processors, necessitating careful per-processor data management,.
  • Concurrency and Locking:
    • Intra-Tasklet Safety: Tasklets are inherently serialized against themselves, eliminating intra-tasklet race conditions. However, accessing data shared between two different tasklets requires standard spin locks.
    • Softirq Locking: Because softirqs execute concurrently without serialization, any shared global data requires rigorous locking,.
    • Process Context Sharing: When process context code shares data with a bottom half, the process code must obtain a lock and explicitly disable bottom-half processing to prevent preemption and deadlocks.
    • Interrupt Context Sharing: When an interrupt handler shares data with a bottom half, the handler must obtain a lock and disable local interrupts entirely.
  • Disabling Bottom Halves:
    • local_bh_disable(): Disables all softirq and tasklet processing on the local processor by incrementing the task’s preempt_count,.
    • local_bh_enable(): Decrements the preempt_count. If the count evaluates to zero, it actively checks for and executes any bottom halves that became pending during the disabled period,. Work queues are unaffected by these calls as they operate asynchronously via the standard process scheduler.