Timers and Time Management

Hardware Clocks and the System Timer

The kernel tracks time via two primary hardware devices: the Real-Time Clock (RTC) and the System Timer.

  • Real-Time Clock (RTC): A nonvolatile, battery-backed device used exclusively during system boot to initialize the absolute wall time.
  • System Timer: An electronic clock or decrementing counter that issues interrupts at a fixed, programmable frequency.
    • On x86 architectures, the primary system timer is the Programmable Interrupt Timer (PIT), though the local APIC timer and Time Stamp Counter (TSC) are also utilized.
    • The period between two successive system timer interrupts is defined as a tick.

The system timer’s periodic interrupt defines the fundamental unit of kernel time measurement, the tick, establishing the operational cadence for the rest of the system.

The Tick Rate ()

The frequency of the system timer is determined by a static preprocessor define, , which dictates the number of timer interrupts per second.

  • Tick Period: The duration of a single tick is calculated as:
  • Architectural Variations: values vary by architecture and machine type.
    • Historically on x86, it was raised to in the 2.5 kernel, and is now configurable.
  • Impact of Higher Values:
    • Advantages: Higher resolution and accuracy for timed events, system calls (e.g., poll() and select()), resource usage statistics, and process preemption. A Hz tick rate reduces average scheduling latency to milliseconds.
    • Disadvantages: Increased processor overhead, power consumption, and cache thrashing due to more frequent timer interrupt executions.
  • Tickless Operation: If configured with CONFIG_HZ set for tickless operation, the kernel dynamically schedules the timer interrupt based on pending timers instead of a fixed interval, significantly reducing power consumption during system idle periods.

The frequency defined by dictates how rapidly the kernel records the passage of time, stored internally as an ever-increasing count of discrete ticks.

Jiffies and Time Accounting

The global variable jiffies tracks the number of ticks that have occurred since the system booted.

  • Data Structure: jiffies is declared as a volatile unsigned long.
  • Uptime Calculation: System uptime is calculated as seconds.
  • Internal Representation and Scalability:
    • A 32-bit jiffies variable overflows in approximately 49.7 days at .
    • To prevent overflow, the primary time management variable is a 64-bit value named jiffies_64.
    • Using linker configuration, the 32-bit jiffies variable is overlaid onto the lower 32 bits of jiffies_64. Time management code accesses the full 64-bit value safely via get_jiffies_64(), which utilizes a seq lock (xtime_lock) to ensure atomic reads on 32-bit architectures.
  • Wraparound Safety: Because integer overflow occurs when maximum storage is reached, absolute comparisons of jiffies are unsafe. The kernel provides four macros in <linux/jiffies.h> for safe comparisons:
    • time_after(unknown, known)
    • time_before(unknown, known)
    • time_after_eq(unknown, known)
    • time_before_eq(unknown, known)
  • User-Space Scaling: To prevent breaking user-space applications when is altered, the kernel exports time values scaled to a fixed USER_HZ constant using jiffies_to_clock_t().

While jiffies tracks relative uptime via discrete ticks, the system must also maintain an absolute record of time for user-space applications.

Wall Time (Time of Day)

The absolute time of day, or wall time, is stored in the xtime variable, defined as a struct timespec.

  • Structure:
    • tv_sec: Seconds elapsed since the epoch (January 1, 1970 UTC).
    • tv_nsec: Nanoseconds elapsed in the current second.
  • Synchronization: Reading and writing xtime requires the xtime_lock, a seq lock. Readers must use a read_seqbegin() and read_seqretry() loop to ensure the data is not modified during the read.
  • User-Space Interface: The wall time is primarily retrieved via the gettimeofday() system call (implemented as sys_gettimeofday()), and set via settimeofday(), which requires CAP_SYS_TIME capabilities.

The absolute wall time and relative jiffies counter are strictly maintained by a dedicated periodic interrupt mechanism.

The Timer Interrupt Handler

The timer interrupt handler drives all periodic system time functions and is divided into an architecture-dependent routine and an architecture-independent routine.

  • Architecture-Dependent Routine:
    • Obtains the xtime_lock seq lock.
    • Acknowledges or resets the system timer hardware.
    • Calls the architecture-independent tick_periodic().
  • Architecture-Independent Routine (tick_periodic()):
    • Increments the 64-bit jiffies_64 count and updates the wall time (xtime) via do_timer().
    • Calculates global load averages via calc_global_load().
    • Accounts for user or system CPU time consumed by the current process via update_process_times().
    • Decrements the running process’s timeslice and marks need_resched if required via scheduler_tick().
    • Marks the TIMER_SOFTIRQ softirq to execute any expired dynamic timers via run_local_timers().

Beyond updating basic time variables and process scheduling statistics, the timer interrupt handler triggers the evaluation of dynamically scheduled future events.

Dynamic Kernel Timers

Dynamic timers are used to delay the execution of a function until a relative point in the future. They are dynamically created, execute once upon expiration, and are destroyed.

  • Structure (struct timer_list):
    • expires: Absolute timeout value in jiffies.
    • function: Handler function to execute upon expiration.
    • data: Argument passed to the handler function.
  • Lifecycle Management:
    • Initialization: Defined via struct timer_list, initialized with init_timer(), and activated with add_timer().
    • Modification: The expiration of an active or inactive timer is altered using mod_timer(), which also activates the timer if it is inactive.
    • Deletion: Timers are deactivated before expiration using del_timer() or del_timer_sync().
      • del_timer_sync() prevents race conditions on SMP machines by waiting for any currently executing timer handlers on other processors to exit. It cannot be called from interrupt context.
  • Execution Mechanism:
    • Timers are evaluated in bottom-half context via the TIMER_SOFTIRQ softirq, specifically within the run_timer_softirq() function.
    • To optimize traversal, the kernel partitions timers into five linked lists based on their expiration values, preventing the overhead of sorting or searching a single global list.

Dynamic timers provide event deferral down to the granularity of a single tick, but hardware interactions often require sub-tick precision without invoking the softirq infrastructure.

Delaying Execution

When code requires delays independent of the TIMER_SOFTIRQ infrastructure or requires sub-tick precision, specific delay mechanisms are utilized.

  • Busy Looping:
    • Spins the processor in a while loop until jiffies reaches a target timeout.
    • Highly inefficient as it hogs the processor.
    • Can be optimized in process context by calling cond_resched() within the loop to allow higher-priority tasks to run.
  • Small, Precise Delays:
    • Used for hardware synchronization requiring sub-millisecond precision.
    • Provided via udelay(), ndelay(), and mdelay().
    • Implemented via busy loops calibrated during boot using the BogoMIPS (loops_per_jiffy) value to execute a precise number of empty iterations.
    • udelay() should not exceed 1 millisecond to prevent integer overflow on fast machines; mdelay() is used for longer busy waits.
  • schedule_timeout():
    • Places the task in a sleeping state (e.g., TASK_INTERRUPTIBLE) and yields the processor until at least the specified number of jiffies has elapsed.
    • Internally creates a dynamic timer (struct timer_list) that awakens the sleeping process upon expiration.
    • Requires process context and cannot be called while holding a spinlock.