06 Advanced Process Management

Advanced Process Management

Process Scheduling

The process scheduler divides finite processor time among a system’s processes, selecting which non-blocked (runnable) process to execute next. Multitasking operating systems interleave execution to provide the illusion of concurrency.

Multitasking Variants
- Cooperative: Processes run until they voluntarily suspend themselves (yield).
- Preemptive: The scheduler dictates when a process is suspended (preempted) in favor of another, allocating each a specific execution duration known as a timeslice. Linux utilizes preemptive multitasking.
Process Categorization
- Processor-Bound: Processes that consume their entire timeslice, seeking maximum CPU time for mathematical calculations or infinite loops. They benefit from large timeslices to maximize cache hit rates via temporal locality.
- I/O-Bound: Processes that frequently block waiting for resources like user input or file I/O. They benefit from fast, prioritized scheduling to quickly dispatch subsequent I/O requests and maintain interactive responsiveness.

Traditional Unix scheduling struggles to balance these competing needs because it relies on fixed timeslices and simple priorities. To solve this, the Linux kernel employs a fundamentally different scheduling algorithm designed around mathematical fairness.

The Completely Fair Scheduler (CFS)

The Completely Fair Scheduler (CFS) replaces fixed timeslices with a dynamic fair queuing algorithm.

Processor Proportions: CFS assigns each process a fraction of the processor’s time, calculated by taking $1/ N$ (where $N$ is the number of runnable processes) and weighting it by the process’s priority (nice value).
Target Latency: The fixed scheduling period over which CFS divides these proportions. For example, a 20 millisecond target latency with two equal-priority processes yields 10 milliseconds of execution time for each.
Minimum Granularity: A hard floor on the shortest allowed execution time for any process. This threshold prevents context switching costs from destroying system throughput when hundreds of processes are runnable, selectively overriding fairness to maintain performance.

While CFS optimally determines when a process should stop running based on these proportional targets, applications can also explicitly surrender their execution time.

Yielding the Processor

Linux provides the sched_yield() system call to allow a process to voluntarily suspend its execution and prompt the scheduler to select a new process.

Behavior: The calling process is suspended; if no other runnable processes exist, the yielding process immediately resumes.
Legitimate Uses: Extremely rare in modern preemptive systems, as the kernel is better equipped to optimize global scheduling decisions.
Historical Context: Prior to kernel support for fast user-space mutexes (futexes), yielding was the most efficient way to implement user-space thread locking when waiting for a contended lock.
Modern Alternatives: Programs should use event-driven architectures and blockable file descriptors rather than busy-looping and yielding the processor.

Instead of manually yielding the processor, applications can more effectively influence their scheduling behavior by altering their assigned priority levels.

Process Priorities

In Linux, non-real-time processes are scheduled based on priority values known as “nice” values.

Nice Values: Legal values range from -20 to 19, with a default of 0.
- The scale is numerically inverted: lower values represent higher priority (and thus a larger processor weight in CFS), while higher values represent lower priority.
- Increasing a nice value is considered “nice” to other processes on the system.
Priority System Calls:
- nice(): Increments the calling process’s current nice value by a relative amount.
- setpriority() / getpriority(): Sets or retrieves the absolute priority for a specific process, process group, or user.
Capability Constraints: Only processes holding the CAP_SYS_NICE capability (typically root) can specify a negative increment or otherwise lower a nice value to increase priority. Unprivileged processes can only lower their priority.
I/O Priorities: The kernel’s I/O scheduler determines relative I/O request priority based on the process’s nice value. The CFQ I/O scheduler additionally supports explicit I/O priorities managed via ioprio_set() and ioprio_get() (or the ionice utility).

Beyond manipulating CPU proportions and I/O handling, maximizing performance requires controlling exactly which physical processor executes the process.

Processor Affinity

Processor affinity dictates the likelihood of a process being consistently scheduled on the same CPU in a multiprocessing system.

Cache Effects: Maintaining affinity minimizes the performance penalties of migrating between processors. Migration requires flushing the Translation Lookaside Buffer (TLB) and invalidating the original processor’s cache to prevent data corruption.
Soft Affinity: The scheduler’s natural, default behavior to keep a process on the same processor unless extreme load imbalances require migration.
Hard Affinity: A strict, user-enforced bond between a process and a specific CPU or set of CPUs.
- Managed via sched_getaffinity() and sched_setaffinity(), which utilize a cpu_set_t bitmask to explicitly enable or disable execution on individual processors.

Enforcing strict processor affinity eliminates the unpredictable latency spikes caused by CPU migration, establishing the determinism required for applications operating under strict deadlines.

Real-Time Systems

Real-time systems are defined by their subjection to operational deadlines—mandatory minimum response times following a stimulus.

Real-Time Classifications:
- Hard Real-Time: Exceeding an operational deadline constitutes a critical system failure (e.g., anti-lock brakes, weapon systems).
- Soft Real-Time: Exceeding an operational deadline degrades performance or user experience but is not fatal (e.g., video playback).
Performance Metrics:
- Latency: The total elapsed time between stimulus and response execution.
- Jitter: The variance in timing between successive responses. Hard real-time systems demand near-zero jitter, responding after an exact duration rather than merely within a window.

Linux provides soft real-time support via POSIX-standardized scheduling policies that strictly respect fixed priorities.

Linux Scheduling Policies

A process’s scheduling policy (or class) dictates how the scheduler treats it. Normal processes have a static priority of 0, while real-time processes have static priorities ranging from 1 to 99. A runnable real-time process will always preempt a normal process.

SCHED_FIFO (First In, First Out): A real-time policy lacking timeslices. A runnable FIFO process executes continuously until it blocks, voluntarily yields, or is preempted by a higher-priority real-time process.
SCHED_RR (Round-Robin): Identical to SCHED_FIFO, but implements timeslices among processes of the same static priority. When a SCHED_RR process exhausts its timeslice, it is moved to the end of the run list for that priority level. The timeslice interval can be retrieved using sched_rr_get_interval().
SCHED_OTHER: The default, non-real-time policy (priority 0) that utilizes nice values under the CFS algorithm.
SCHED_BATCH: An idle policy where processes run only when no other runnable processes exist on the system.
System Calls: Policies and priorities are configured using sched_setscheduler() and sched_setparam(). The valid priority range for a specific policy must be retrieved programmatically using sched_get_priority_min() and sched_get_priority_max().

Achieving Determinism in Real-Time Processes

Real-time processing requires eliminating the unpredictable delays inherent in modern hardware architectures.

Preventing Page Faults (Prefaulting): Demand paging introduces severe, non-deterministic latency spikes by triggering disk I/O. Real-time applications prevent this by “locking” their pages into physical RAM using mlock() or mlockall() to bypass the swap file entirely.
Preventing Multitasking Interference: Normal scheduler preemptions introduce jitter. Real-time processes enforce CPU isolation by modifying the init process’s CPU affinity mask to explicitly forbid execution on a specific core. The real-time application then pins itself to that isolated core using sched_setaffinity(), ensuring 100% uninterrupted processor access.

While real-time processes utilize memory locking to aggressively guarantee performance, the kernel ultimately restricts this and other resource consumption to maintain global system stability.

Resource Limits

The Linux kernel enforces strict, hard ceilings on the resources a single process can consume to prevent system exhaustion.

Limit Structures: Evaluated via getrlimit() and setrlimit(), each resource possesses a soft limit and a hard limit.
- Soft Limit: The currently enforced threshold. Any process can freely raise its soft limit up to the value of the hard limit.
- Hard Limit: The absolute ceiling. Unprivileged processes can only lower their hard limit (an irreversible action), while processes with CAP_SYS_RESOURCE can raise it.
- Special Values: A value of 0 completely disables the resource feature, while RLIM_INFINITY (-1) removes the limit entirely.
Critical Limits:
- RLIMIT_AS & RLIMIT_DATA: Restricts the maximum size of the virtual address space and data segment (heap), respectively.
- RLIMIT_CORE: Defines the maximum size of a generated core dump file; setting to 0 disables core dumps.
- RLIMIT_CPU: The maximum CPU time (in seconds) a process may consume before the kernel dispatches a SIGXCPU signal.
- RLIMIT_FSIZE: The maximum allowed file size. Exceeding this triggers a SIGXFSZ signal.
- RLIMIT_MEMLOCK: Caps the amount of memory an unprivileged process can lock into RAM (directly impacting the real-time prefaulting strategies discussed earlier).
- RLIMIT_NOFILE: The absolute maximum number of open file descriptors permitted for the process.
- RLIMIT_NPROC: The maximum number of concurrent processes the user may own.
- RLIMIT_RTPRIO: The highest real-time priority level a non-root process may request.

My Knowledge Base

Explorer