System Calls
System Call Architecture and Purpose
System calls provide a critical abstraction layer between physical hardware and user-space processes. They serve three core architectural purposes:
- Hardware Abstraction: Isolates user-space applications from physical hardware details, medium types, and specific filesystem implementations.
- Security and Stability: Acts as an access arbitrator to prevent applications from incorrectly manipulating hardware or stealing resources.
- System Virtualization: Facilitates multitasking and virtual memory by ensuring the kernel has absolute awareness of resource requests.
- Exclusive Access Entry: Forms the only legal entry point into the kernel outside of exceptions and traps.
Applications rarely invoke system calls directly; they interface with user-space APIs, such as the POSIX standard implemented by the C library. The kernel provides the operational mechanism without dictating the usage policy.
To provide these abstract hardware interfaces, the kernel implements specific C definitions and strict numeric identifiers for each system call.
System Call Definitions and Numbers
System calls are accessed via function calls defined in the C library, producing zero or more side effects and returning a long value indicating success or failure.
- Return Values: Typically, a negative return value maps to an error code (translated to
errnoby the C library), whereas zero indicates success. - Macro Definition: System calls are defined using the
SYSCALL_DEFINExmacro, where represents the number of input parameters. asmlinkageModifier: A required directive instructing the compiler to locate all function arguments exclusively on the stack.- Naming Convention: A system call
bar()is implemented inside the kernel assys_bar(). - System Call Numbers:
- Each system call is mapped to a unique, immutable syscall number.
- Syscall numbers cannot be reassigned or recycled if a system call is removed.
- Deprecated system calls are patched with
sys_ni_syscall(), which solely returns-ENOSYSto prevent application breakage.
- System Call Table: The
sys_call_tablestores the mapping of syscall numbers to their respective handlers (e.g., inarch/i386/kernel/syscall_64.cfor x86-64).
Once uniquely numbered and defined, these functions require a dedicated hardware mechanism to transition execution from user-space to kernel-space.
System Call Handler and Invocation
User-space processes cannot execute kernel code directly and must trigger an exception to switch the processor into kernel mode.
- Software Interrupts: Invocation occurs via a software interrupt, mapping to an exception handler known as the system call handler.
- x86 Implementation: Historically executed via interrupt vector 128 (
int $0x80), and more recently via the optimizedsysenterinstruction. - The
system_call()Handler: The architecture-specific entry point (e.g.,entry_64.S) that catches the trap.- Verifies the requested syscall number against
NR_syscalls. - Returns
-ENOSYSif the requested number exceeds valid boundaries. - Executes the target function by multiplying the syscall number by the architecture’s word size and indexing into
sys_call_table.
- Verifies the requested syscall number against
- Parameter Passing (x86-32):
- The syscall number is loaded into the
eaxregister before the trap. - The first five arguments are passed sequentially via
ebx,ecx,edx,esi, andedi. - For six or more arguments, a single register provides a pointer to a user-space memory block containing the parameters.
- The kernel returns the execution result to user-space via the
eaxregister.
- The syscall number is loaded into the
Capturing the hardware trap and hardware registers sets the stage for executing the syscall implementation, which demands rigorous input validation.
Implementing System Calls
New system calls must adhere to strict design principles and security checks to maintain kernel integrity.
- Design Rules:
- Implement exactly one specific purpose.
- Avoid multiplexing multiple behaviors over a single call based on arguments (e.g.,
ioctl()anti-pattern). - Provide forward compatibility by supporting a flags parameter.
- Maintain strict architecture and endianness portability without making assumptions regarding word or page sizes.
- Parameter Verification: The kernel must validate all inputs to prevent arbitrary memory access violations.
- Pointers must target valid regions within the user-space process address space.
- Pointers must respect mapping permissions (readable, writable, or executable).
- Memory Copying Methods: Kernel code must use specialized functions to move data across the user/kernel boundary.
copy_to_user(dst, src, len): Safely writes data from kernel-space to user-space.copy_from_user(dst, src, len): Safely reads data from user-space into kernel-space.- Both functions return the number of bytes that failed to copy, returning zero on absolute success.
- Both functions may sleep (block) if the memory is paged out and requires a disk swap.
- Capability Checks: Fine-grained authorization is enforced via the
capable()function, passing specific flags likeCAP_SYS_REBOOTorCAP_SYS_NICE, rather than relying on legacy root checks.
Validating parameters and enforcing capabilities happens safely because syscalls operate within a specific, preemptible execution context.
Execution Context and Registration
System calls execute strictly in process context, altering how the kernel manages concurrency and blocking.
- Context Properties:
- The
currentmacro points to the user-space task that issued the syscall. - The kernel can safely sleep (block) during syscall execution to wait for resources.
- The execution is fully preemptible, meaning another task may be scheduled during execution.
- Code must be thoroughly reentrant to support symmetrical multiprocessing (SMP) and kernel preemption.
- The
- Registration Pipeline: Integrating a completed syscall requires three steps:
- Append the
sys_foopointer to the bottom of the architecture’ssys_call_table. - Define the syscall number macro (e.g.,
#define __NR_foo) in<asm/unistd.h>. - Compile the syscall implementation permanently into the core kernel image (e.g., inside
kernel/sys.c).
- Append the
- Direct User-Space Access: Without standard C library wrappers, user-space applications invoke syscalls using the
_syscalln()macro suite.- denotes the argument count (0 through 6).
- The macro requires parameters: the return type, the syscall name, and the type/name pair for each argument.
- The macro expands into inline assembly that configures the registers and executes the trap.
Despite the relative ease of registering a new system call, the rigid constraints of the system call table heavily favor exploring alternative kernel interfaces.
Alternatives to System Calls
Adding new system calls is generally discouraged by kernel developers due to interface rigidity and namespace pollution.
- System Call Disadvantages:
- Requires an officially assigned syscall number.
- The interface becomes immutable immediately upon release.
- Requires duplicated registration work across every supported architecture.
- Inaccessible directly from shell scripts or file utilities.
- Difficult to maintain in external modules outside the master kernel tree.
- Recommended Alternatives:
- Device Nodes: Implement a device with
read(),write(), andioctl()handlers. - Sysfs: Export kernel data or configurations as files in the appropriate
/systree location. - File Descriptors: Represent obscure or complex interfaces (like semaphores) natively as manipulatable file descriptors.
- Device Nodes: Implement a device with