Process Management

The Process Abstraction

  • Process Definition: A process acts as a program in the midst of execution, encompassing object code alongside resources such as open files, pending signals, internal kernel data, processor state, a memory address space, and one or more threads of execution.
  • Threads of Execution: The objects of activity within a process, containing a unique program counter, a process stack, and a set of processor registers.
  • Virtualizations:
    • Virtualized Processor: Grants the process the illusion of monopolizing the system CPU.
    • Virtual Memory: Permits the process to allocate and manage memory as if it alone owned all system memory.
  • Lifecycle Primitives:
    • Creation: Achieved via the fork() system call, duplicating an existing parent process into a new child process.
    • Execution: Handled by the exec() family of function calls, which creates a new address space and loads a new program into it.
    • Termination: Executed via the exit() system call, terminating the process, freeing its resources, and placing it into a zombie state until the parent queries its status using wait4().

To manage these abstractions and track lifecycle events, the system requires a comprehensive data structure.

Process Descriptor and Task Structure

  • Task List: A circular doubly linked list that stores all processes in the system.
  • task_struct: The process descriptor forming each element of the task list, defined in <linux/sched.h>.
    • Contains all kernel-required information about a process, including open files, address space, pending signals, and process state.
    • Consumes approximately 1.7 kilobytes on a 32-bit architecture.
  • Allocation: Dynamically created via the slab allocator to facilitate object reuse and cache coloring.
    • A secondary structure, thread_info, is allocated at the bottom or top of the kernel stack (depending on stack growth direction).
    • thread_info contains a direct pointer (task) to the actual task_struct.
  • Process Identification (PID): A numerical value of opaque type pid_t identifying tasks.
    • Default maximum is (short int limit) for backward compatibility, but optionally expandable to via /proc/sys/kernel/pid_max.
  • The current Macro: Quickly looks up the process descriptor of the actively executing task.
    • On register-impaired architectures (x86), calculated by masking the least-significant bits of the stack pointer to locate thread_info, then dereferencing its task pointer.
    • On register-rich architectures (PowerPC), stored directly in a dedicated processor register.

A critical component of this descriptor is the current condition of the process, represented by its state.

Process State and Context

  • State Flags: The state field of the process descriptor dictates the current condition via exactly one of five flags:
    • TASK_RUNNING: The process is runnable; actively executing or waiting on a runqueue.
    • TASK_INTERRUPTIBLE: The process is sleeping (blocked) awaiting a specific condition, waking prematurely if it receives a signal.
    • TASK_UNINTERRUPTIBLE: Identical to TASK_INTERRUPTIBLE but ignores signals; used for rapid events or when interruption must be prevented.
    • __TASK_TRACED: The process is being traced by another process, such as a debugger.
    • __TASK_STOPPED: Execution has stopped entirely and the task is ineligible to run, typically following SIGSTOP, SIGTSTP, SIGTTIN, or SIGTTOU.
  • State Manipulation: Executed via set_task_state(task, state), which sets the state and forces ordering memory barriers on SMP systems.
  • Process Context:
    • Entered when a program executes a system call or triggers an exception, moving execution from user-space to kernel-space.
    • The kernel acts “on behalf of the process,” making the current macro valid.

These executing contexts do not exist in isolation, but instead form a strict hierarchy rooted at a single initial process.

The Process Family Tree

  • Hierarchy: All processes are descendants of the init process (PID 1), which is spawned by the kernel during the final boot steps.
  • Relationships: Stored directly in task_struct:
    • parent: A pointer to the parent’s task_struct.
    • children: A list of child processes.
    • sibling: A list element connecting direct children of the same parent.
  • Iteration:
    • The init_task process descriptor is statically allocated and acts as an anchor for hierarchy traversal.
    • next_task(task) and prev_task(task) macros navigate the circular task list.
    • for_each_process(task) macro iterates completely over all processes in the system.

Growing this family tree relies on specialized creation mechanisms designed to minimize overhead.

Process Creation and Copy-On-Write

  • Separation of Operations: Process creation separates into fork() (copying the current task) and exec() (loading a new executable).
  • Copy-on-Write (COW): A technique delaying or preventing the copying of data upon fork().
    • Parent and child share a single, read-only copy of the address space.
    • Duplication occurs strictly when data is written to.
    • Reduces fork() overhead to only duplicating the parent’s page tables and generating a unique process descriptor.
  • Forking Implementation: Managed via the clone() system call, which invokes do_fork() and ultimately copy_process().
    • Steps in copy_process():
      1. dup_task_struct() establishes a new kernel stack, thread_info, and task_struct identical to the parent.
      2. Verifies the new child will not breach user resource limits.
      3. Clears non-inherited statistical parameters.
      4. Forces state to TASK_UNINTERRUPTIBLE to prevent premature execution.
      5. copy_flags() strips PF_SUPERPRIV and sets PF_FORKNOEXEC.
      6. alloc_pid() assigns a unique PID.
      7. Duplicates or shares open files, filesystem data, signal handlers, and address spaces based on provided clone() flags.
      8. Returns a pointer to the new child.
    • The kernel deliberately executes the child process first to bypass COW overhead if the child immediately calls exec().
  • vfork(): An optimization where parent page tables are not copied. The child executes within the parent’s address space while the parent blocks until the child calls exec() or exits.

The clone() infrastructure natively supports creating processes that aggressively share resources, bridging the conceptual gap to threads.

Threads of Execution

  • Linux Thread Architecture: The kernel recognizes no distinct thread concept; threads are standard processes sharing resources like address spaces and open files.
    • Each thread receives a standard task_struct.
  • Creation: Instantiated matching a normal task, but clone() is passed specific flags dictating resource sharing:
    • CLONE_VM: Parent and child share address space.
    • CLONE_FS: Parent and child share filesystem information.
    • CLONE_FILES: Parent and child share open files.
    • CLONE_SIGHAND: Parent and child share signal handlers and blocked signals.
  • Kernel Threads: Standard processes existing solely in kernel-space to handle background operations.
    • Possess no address space (the mm pointer is NULL).
    • Schedulable and preemptable identical to normal tasks.
    • Created specifically by other kernel threads (originating from the kthreadd process).
    • Spawning interface: kthread_create() instantiates the thread in an unrunnable state, requiring wake_up_process().
    • kthread_run() acts as a macro combining creation and waking.
    • Terminated via do_exit() or explicitly aborted via kthread_stop().

Whether a user thread, kernel thread, or standard process, all tasks eventually reach the end of their lifecycle and must be dismantled.

Process Termination

  • Destruction Triggers: Occurs intentionally via the exit() system call, or involuntarily upon receiving an unhandled signal or exception.
  • do_exit() Execution:
    1. Configures the PF_EXITING flag in the task_struct.
    2. Purges kernel timers via del_timer_sync().
    3. Writes BSD process accounting information via acct_update_integrals().
    4. Calls exit_mm() to release the address space (destroyed if not shared).
    5. Dequeues IPC semaphores via exit_sem().
    6. Decrements usage counts of file descriptors (exit_files()) and filesystem data (exit_fs()).
    7. Records the exit code in exit_code for the parent’s optional retrieval.
    8. Invokes exit_notify() to signal the parent, reparents any children, and transitions state to EXIT_ZOMBIE.
    9. Calls schedule() to permanently switch to a new process.
  • Removing the Process Descriptor: Deallocation is distinct from cleanup, allowing the parent to query child status via the wait4() system call.
    • release_task() Execution:
      1. __exit_signal() triggers detach_pid() to remove the process from the pidhash and task list.
      2. Finalizes statistics and releases remaining resources.
      3. put_task_struct() frees the kernel stack, thread_info, and the task_struct slab cache.
  • The Dilemma of the Parentless Task: If a parent exits prematurely, its children must be reparented to prevent permanent zombie states.
    • do_exit() invokes forget_original_parent() and find_new_reaper().
    • The kernel attempts to reparent children to another process within the current thread group.
    • If no thread group process is available, children are reparented to the init process.
    • The init process routinely executes wait() on its children, systematically cleaning up zombies.