Process Management
The Process Abstraction
- Process Definition: A process acts as a program in the midst of execution, encompassing object code alongside resources such as open files, pending signals, internal kernel data, processor state, a memory address space, and one or more threads of execution.
- Threads of Execution: The objects of activity within a process, containing a unique program counter, a process stack, and a set of processor registers.
- Virtualizations:
- Virtualized Processor: Grants the process the illusion of monopolizing the system CPU.
- Virtual Memory: Permits the process to allocate and manage memory as if it alone owned all system memory.
- Lifecycle Primitives:
- Creation: Achieved via the
fork()system call, duplicating an existing parent process into a new child process. - Execution: Handled by the
exec()family of function calls, which creates a new address space and loads a new program into it. - Termination: Executed via the
exit()system call, terminating the process, freeing its resources, and placing it into a zombie state until the parent queries its status usingwait4().
- Creation: Achieved via the
To manage these abstractions and track lifecycle events, the system requires a comprehensive data structure.
Process Descriptor and Task Structure
- Task List: A circular doubly linked list that stores all processes in the system.
task_struct: The process descriptor forming each element of the task list, defined in<linux/sched.h>.- Contains all kernel-required information about a process, including open files, address space, pending signals, and process state.
- Consumes approximately 1.7 kilobytes on a 32-bit architecture.
- Allocation: Dynamically created via the slab allocator to facilitate object reuse and cache coloring.
- A secondary structure,
thread_info, is allocated at the bottom or top of the kernel stack (depending on stack growth direction). thread_infocontains a direct pointer (task) to the actualtask_struct.
- A secondary structure,
- Process Identification (PID): A numerical value of opaque type
pid_tidentifying tasks.- Default maximum is (short int limit) for backward compatibility, but optionally expandable to via
/proc/sys/kernel/pid_max.
- Default maximum is (short int limit) for backward compatibility, but optionally expandable to via
- The
currentMacro: Quickly looks up the process descriptor of the actively executing task.- On register-impaired architectures (x86), calculated by masking the least-significant bits of the stack pointer to locate
thread_info, then dereferencing itstaskpointer. - On register-rich architectures (PowerPC), stored directly in a dedicated processor register.
- On register-impaired architectures (x86), calculated by masking the least-significant bits of the stack pointer to locate
A critical component of this descriptor is the current condition of the process, represented by its state.
Process State and Context
- State Flags: The
statefield of the process descriptor dictates the current condition via exactly one of five flags:TASK_RUNNING: The process is runnable; actively executing or waiting on a runqueue.TASK_INTERRUPTIBLE: The process is sleeping (blocked) awaiting a specific condition, waking prematurely if it receives a signal.TASK_UNINTERRUPTIBLE: Identical toTASK_INTERRUPTIBLEbut ignores signals; used for rapid events or when interruption must be prevented.__TASK_TRACED: The process is being traced by another process, such as a debugger.__TASK_STOPPED: Execution has stopped entirely and the task is ineligible to run, typically followingSIGSTOP,SIGTSTP,SIGTTIN, orSIGTTOU.
- State Manipulation: Executed via
set_task_state(task, state), which sets the state and forces ordering memory barriers on SMP systems. - Process Context:
- Entered when a program executes a system call or triggers an exception, moving execution from user-space to kernel-space.
- The kernel acts “on behalf of the process,” making the
currentmacro valid.
These executing contexts do not exist in isolation, but instead form a strict hierarchy rooted at a single initial process.
The Process Family Tree
- Hierarchy: All processes are descendants of the
initprocess (PID 1), which is spawned by the kernel during the final boot steps. - Relationships: Stored directly in
task_struct:parent: A pointer to the parent’stask_struct.children: A list of child processes.sibling: A list element connecting direct children of the same parent.
- Iteration:
- The
init_taskprocess descriptor is statically allocated and acts as an anchor for hierarchy traversal. next_task(task)andprev_task(task)macros navigate the circular task list.for_each_process(task)macro iterates completely over all processes in the system.
- The
Growing this family tree relies on specialized creation mechanisms designed to minimize overhead.
Process Creation and Copy-On-Write
- Separation of Operations: Process creation separates into
fork()(copying the current task) andexec()(loading a new executable). - Copy-on-Write (COW): A technique delaying or preventing the copying of data upon
fork().- Parent and child share a single, read-only copy of the address space.
- Duplication occurs strictly when data is written to.
- Reduces
fork()overhead to only duplicating the parent’s page tables and generating a unique process descriptor.
- Forking Implementation: Managed via the
clone()system call, which invokesdo_fork()and ultimatelycopy_process().- Steps in
copy_process():dup_task_struct()establishes a new kernel stack,thread_info, andtask_structidentical to the parent.- Verifies the new child will not breach user resource limits.
- Clears non-inherited statistical parameters.
- Forces state to
TASK_UNINTERRUPTIBLEto prevent premature execution. copy_flags()stripsPF_SUPERPRIVand setsPF_FORKNOEXEC.alloc_pid()assigns a unique PID.- Duplicates or shares open files, filesystem data, signal handlers, and address spaces based on provided
clone()flags. - Returns a pointer to the new child.
- The kernel deliberately executes the child process first to bypass COW overhead if the child immediately calls
exec().
- Steps in
vfork(): An optimization where parent page tables are not copied. The child executes within the parent’s address space while the parent blocks until the child callsexec()or exits.
The clone() infrastructure natively supports creating processes that aggressively share resources, bridging the conceptual gap to threads.
Threads of Execution
- Linux Thread Architecture: The kernel recognizes no distinct thread concept; threads are standard processes sharing resources like address spaces and open files.
- Each thread receives a standard
task_struct.
- Each thread receives a standard
- Creation: Instantiated matching a normal task, but
clone()is passed specific flags dictating resource sharing:CLONE_VM: Parent and child share address space.CLONE_FS: Parent and child share filesystem information.CLONE_FILES: Parent and child share open files.CLONE_SIGHAND: Parent and child share signal handlers and blocked signals.
- Kernel Threads: Standard processes existing solely in kernel-space to handle background operations.
- Possess no address space (the
mmpointer isNULL). - Schedulable and preemptable identical to normal tasks.
- Created specifically by other kernel threads (originating from the
kthreaddprocess). - Spawning interface:
kthread_create()instantiates the thread in an unrunnable state, requiringwake_up_process(). kthread_run()acts as a macro combining creation and waking.- Terminated via
do_exit()or explicitly aborted viakthread_stop().
- Possess no address space (the
Whether a user thread, kernel thread, or standard process, all tasks eventually reach the end of their lifecycle and must be dismantled.
Process Termination
- Destruction Triggers: Occurs intentionally via the
exit()system call, or involuntarily upon receiving an unhandled signal or exception. do_exit()Execution:- Configures the
PF_EXITINGflag in thetask_struct. - Purges kernel timers via
del_timer_sync(). - Writes BSD process accounting information via
acct_update_integrals(). - Calls
exit_mm()to release the address space (destroyed if not shared). - Dequeues IPC semaphores via
exit_sem(). - Decrements usage counts of file descriptors (
exit_files()) and filesystem data (exit_fs()). - Records the exit code in
exit_codefor the parent’s optional retrieval. - Invokes
exit_notify()to signal the parent, reparents any children, and transitions state toEXIT_ZOMBIE. - Calls
schedule()to permanently switch to a new process.
- Configures the
- Removing the Process Descriptor: Deallocation is distinct from cleanup, allowing the parent to query child status via the
wait4()system call.release_task()Execution:__exit_signal()triggersdetach_pid()to remove the process from the pidhash and task list.- Finalizes statistics and releases remaining resources.
put_task_struct()frees the kernel stack,thread_info, and thetask_structslab cache.
- The Dilemma of the Parentless Task: If a parent exits prematurely, its children must be reparented to prevent permanent zombie states.
do_exit()invokesforget_original_parent()andfind_new_reaper().- The kernel attempts to reparent children to another process within the current thread group.
- If no thread group process is available, children are reparented to the
initprocess. - The
initprocess routinely executeswait()on its children, systematically cleaning up zombies.