Timers and Time Management
Hardware Clocks and the System Timer
The kernel tracks time via two primary hardware devices: the Real-Time Clock (RTC) and the System Timer.
- Real-Time Clock (RTC): A nonvolatile, battery-backed device used exclusively during system boot to initialize the absolute wall time.
- System Timer: An electronic clock or decrementing counter that issues interrupts at a fixed, programmable frequency.
- On x86 architectures, the primary system timer is the Programmable Interrupt Timer (PIT), though the local APIC timer and Time Stamp Counter (TSC) are also utilized.
- The period between two successive system timer interrupts is defined as a tick.
The system timer’s periodic interrupt defines the fundamental unit of kernel time measurement, the tick, establishing the operational cadence for the rest of the system.
The Tick Rate ()
The frequency of the system timer is determined by a static preprocessor define, , which dictates the number of timer interrupts per second.
- Tick Period: The duration of a single tick is calculated as:
- Architectural Variations: values vary by architecture and machine type.
- Historically on x86, it was raised to in the 2.5 kernel, and is now configurable.
- Impact of Higher Values:
- Advantages: Higher resolution and accuracy for timed events, system calls (e.g.,
poll()andselect()), resource usage statistics, and process preemption. A Hz tick rate reduces average scheduling latency to milliseconds. - Disadvantages: Increased processor overhead, power consumption, and cache thrashing due to more frequent timer interrupt executions.
- Advantages: Higher resolution and accuracy for timed events, system calls (e.g.,
- Tickless Operation: If configured with
CONFIG_HZset for tickless operation, the kernel dynamically schedules the timer interrupt based on pending timers instead of a fixed interval, significantly reducing power consumption during system idle periods.
The frequency defined by dictates how rapidly the kernel records the passage of time, stored internally as an ever-increasing count of discrete ticks.
Jiffies and Time Accounting
The global variable jiffies tracks the number of ticks that have occurred since the system booted.
- Data Structure:
jiffiesis declared as avolatile unsigned long. - Uptime Calculation: System uptime is calculated as seconds.
- Internal Representation and Scalability:
- A 32-bit
jiffiesvariable overflows in approximately 49.7 days at . - To prevent overflow, the primary time management variable is a 64-bit value named
jiffies_64. - Using linker configuration, the 32-bit
jiffiesvariable is overlaid onto the lower 32 bits ofjiffies_64. Time management code accesses the full 64-bit value safely viaget_jiffies_64(), which utilizes a seq lock (xtime_lock) to ensure atomic reads on 32-bit architectures.
- A 32-bit
- Wraparound Safety: Because integer overflow occurs when maximum storage is reached, absolute comparisons of
jiffiesare unsafe. The kernel provides four macros in<linux/jiffies.h>for safe comparisons:time_after(unknown, known)time_before(unknown, known)time_after_eq(unknown, known)time_before_eq(unknown, known)
- User-Space Scaling: To prevent breaking user-space applications when is altered, the kernel exports time values scaled to a fixed
USER_HZconstant usingjiffies_to_clock_t().
While jiffies tracks relative uptime via discrete ticks, the system must also maintain an absolute record of time for user-space applications.
Wall Time (Time of Day)
The absolute time of day, or wall time, is stored in the xtime variable, defined as a struct timespec.
- Structure:
tv_sec: Seconds elapsed since the epoch (January 1, 1970 UTC).tv_nsec: Nanoseconds elapsed in the current second.
- Synchronization: Reading and writing
xtimerequires thextime_lock, a seq lock. Readers must use aread_seqbegin()andread_seqretry()loop to ensure the data is not modified during the read. - User-Space Interface: The wall time is primarily retrieved via the
gettimeofday()system call (implemented assys_gettimeofday()), and set viasettimeofday(), which requiresCAP_SYS_TIMEcapabilities.
The absolute wall time and relative jiffies counter are strictly maintained by a dedicated periodic interrupt mechanism.
The Timer Interrupt Handler
The timer interrupt handler drives all periodic system time functions and is divided into an architecture-dependent routine and an architecture-independent routine.
- Architecture-Dependent Routine:
- Obtains the
xtime_lockseq lock. - Acknowledges or resets the system timer hardware.
- Calls the architecture-independent
tick_periodic().
- Obtains the
- Architecture-Independent Routine (
tick_periodic()):- Increments the 64-bit
jiffies_64count and updates the wall time (xtime) viado_timer(). - Calculates global load averages via
calc_global_load(). - Accounts for user or system CPU time consumed by the current process via
update_process_times(). - Decrements the running process’s timeslice and marks
need_reschedif required viascheduler_tick(). - Marks the
TIMER_SOFTIRQsoftirq to execute any expired dynamic timers viarun_local_timers().
- Increments the 64-bit
Beyond updating basic time variables and process scheduling statistics, the timer interrupt handler triggers the evaluation of dynamically scheduled future events.
Dynamic Kernel Timers
Dynamic timers are used to delay the execution of a function until a relative point in the future. They are dynamically created, execute once upon expiration, and are destroyed.
- Structure (
struct timer_list):expires: Absolute timeout value injiffies.function: Handler function to execute upon expiration.data: Argument passed to the handler function.
- Lifecycle Management:
- Initialization: Defined via
struct timer_list, initialized withinit_timer(), and activated withadd_timer(). - Modification: The expiration of an active or inactive timer is altered using
mod_timer(), which also activates the timer if it is inactive. - Deletion: Timers are deactivated before expiration using
del_timer()ordel_timer_sync().del_timer_sync()prevents race conditions on SMP machines by waiting for any currently executing timer handlers on other processors to exit. It cannot be called from interrupt context.
- Initialization: Defined via
- Execution Mechanism:
- Timers are evaluated in bottom-half context via the
TIMER_SOFTIRQsoftirq, specifically within therun_timer_softirq()function. - To optimize traversal, the kernel partitions timers into five linked lists based on their expiration values, preventing the overhead of sorting or searching a single global list.
- Timers are evaluated in bottom-half context via the
Dynamic timers provide event deferral down to the granularity of a single tick, but hardware interactions often require sub-tick precision without invoking the softirq infrastructure.
Delaying Execution
When code requires delays independent of the TIMER_SOFTIRQ infrastructure or requires sub-tick precision, specific delay mechanisms are utilized.
- Busy Looping:
- Spins the processor in a
whileloop untiljiffiesreaches a target timeout. - Highly inefficient as it hogs the processor.
- Can be optimized in process context by calling
cond_resched()within the loop to allow higher-priority tasks to run.
- Spins the processor in a
- Small, Precise Delays:
- Used for hardware synchronization requiring sub-millisecond precision.
- Provided via
udelay(),ndelay(), andmdelay(). - Implemented via busy loops calibrated during boot using the
BogoMIPS(loops_per_jiffy) value to execute a precise number of empty iterations. udelay()should not exceed 1 millisecond to prevent integer overflow on fast machines;mdelay()is used for longer busy waits.
schedule_timeout():- Places the task in a sleeping state (e.g.,
TASK_INTERRUPTIBLE) and yields the processor until at least the specified number ofjiffieshas elapsed. - Internally creates a dynamic timer (
struct timer_list) that awakens the sleeping process upon expiration. - Requires process context and cannot be called while holding a spinlock.
- Places the task in a sleeping state (e.g.,