System Calls

System Call Architecture and Purpose

System calls provide a critical abstraction layer between physical hardware and user-space processes. They serve three core architectural purposes:

  • Hardware Abstraction: Isolates user-space applications from physical hardware details, medium types, and specific filesystem implementations.
  • Security and Stability: Acts as an access arbitrator to prevent applications from incorrectly manipulating hardware or stealing resources.
  • System Virtualization: Facilitates multitasking and virtual memory by ensuring the kernel has absolute awareness of resource requests.
  • Exclusive Access Entry: Forms the only legal entry point into the kernel outside of exceptions and traps. In Linux, system calls are the only means user-space has of interfacing with the kernel; they are the only legal entry point into the kernel other than exceptions and traps

Applications rarely invoke system calls directly; they interface with user-space APIs, such as the POSIX standard implemented by the C library. The kernel provides the operational mechanism without dictating the usage policy.

Syscalls

System calls are accessed via function calls defined in the C library, producing zero or more side effects and returning a long value indicating success or failure.

  • Return Values: Typically, a negative return value maps to an error code (translated to errno by the C library), whereas zero indicates success. This variable can be translated into human-readable errors via library functions such as perror().
  • Macro Definition: System calls are defined using the SYSCALL_DEFINEx macro, where represents the number of input parameters. Example:
  • SYSCALL_DEFINE0(getpid)
      {
      	  return task_tgid_vnr(current);
      }
- The expanded code looks like this:

asmlinkage long sys_getpid(void)


- **`asmlinkage` Modifier:** A required directive instructing the compiler to locate all function arguments exclusively on the stack.
- **Naming Convention:** A system call `bar()` is implemented inside the kernel as `sys_bar()`.
- **System Call Numbers:**
    - Each system call is mapped to a unique, immutable syscall number.
    - The process does not refer to the `syscall` by name.
    - Syscall numbers cannot be reassigned or recycled if a system call is removed.
    - Deprecated system calls are patched with `sys_ni_syscall()`, which solely returns `-ENOSYS` to prevent application breakage.
- **System Call Table:** The `sys_call_table` stores the mapping of syscall numbers to their respective handlers (e.g., in `arch/i386/kernel/syscall_64.c` for x86-64).

### System Call Handler

User-space processes cannot execute kernel code directly and must trigger an exception to switch the processor into kernel mode.

- **Software Interrupts:** Invocation occurs via a software interrupt, mapping to an exception handler known as the system call handler.
- **x86 Implementation:** Historically executed via interrupt vector 128 (`int $0x80`), and more recently via the optimized `sysenter` instruction.
- **The `system_call()` Handler:** The architecture-specific entry point (e.g., `entry_64.S`) that catches the trap.
    - Verifies the requested syscall number against `NR_syscalls`.
    - Returns `-ENOSYS` if the requested number exceeds valid boundaries.
    - Executes the target function by multiplying the syscall number by the architecture's word size and indexing into `sys_call_table`.

![[Pasted image 20260521135339.png]]

- **Parameter Passing (x86-32):**
    - The syscall number is loaded into the `eax` register before the trap.
    - The first five arguments are passed sequentially via `ebx`, `ecx`, `edx`, `esi`, and `edi`.
    - For six or more arguments, a single register provides a pointer to a user-space memory block containing the parameters.
    - The kernel returns the execution result to user-space via the `eax` register.

### System Calls Implementation

New system calls must adhere to strict design principles and security checks to maintain kernel integrity.

- **Design Rules:**
    - Implement exactly one specific purpose.
    - Avoid multiplexing multiple behaviors over a single call based on arguments (e.g., `ioctl()` anti-pattern).
    - What are the new system call's arguments, return value, and error codes? The system call should have a clean and simple interface with the smallest number of arguments possible. The semantics and behavior of a system call are important; they must not change, because existing applications will come to reply on them.
    - Can new functionality be added to your system call or will any change require an entirely new function? Can you easily fix bugs without breaking backward compatibility? Many system calls provide a flag argument to address forward compatibility.The flag is not used to multiplex different behavior across a single system call—as mentioned, that is not acceptable—but to enable new functionality and options without breaking backward compatibility or needing to add a new system call.
    - Design the system call to be as general as possible.
    - Do not assume its use today will be the same as its use tomorrow.The purpose of the system call will remain constant but its uses may change.
    - Is the system call portable? Do not make assumptions about an architecture’s word size or endianness. 
    - Provide forward compatibility by supporting a flags parameter.
    - Maintain strict architecture and endianness portability without making assumptions regarding word or page sizes.
    - When you write a system call, you need to realize the need for portability and robust- ness, not just today but in the future.
- **Parameter Verification:** The system call runs in kernel-space, and if the user can pass invalid input into the kernel without restraint, the system’s security and stability can suffer.
	- file I/O syscalls must check whether the file descriptor is valid.
	- Process- related functions must check whether the provided PID is valid.
	- One of the most important checks is the validity of any pointers that the user pro- vides
	- -e kernel must validate all inputs to prevent arbitrary memory access violations.
    - Pointers must target valid regions within the user-space process address space.
    - Pointers must respect mapping permissions (readable, writable, or executable).
- **Memory Copying Methods:** Kernel code must use specialized functions to move data across the user/kernel boundary.
    - `copy_to_user(dst, src, len)`: Safely writes data from kernel-space to user-space.
    - `copy_from_user(dst, src, len)`: Safely reads data from user-space into kernel-space.
    - Both functions return the number of bytes that failed to copy, returning zero on absolute success.
    - Both functions may sleep (block) if the memory is paged out and requires a disk swap.
- **Capability Checks:** Fine-grained authorization is enforced via the `capable()` function, passing specific flags like `CAP_SYS_REBOOT` or `CAP_SYS_NICE`, rather than relying on legacy root checks.

### Execution Context

System calls execute strictly in process context, altering how the kernel manages concurrency and blocking.

- **Context Properties:**
    - The `current` macro points to the user-space task that issued the syscall.
    - The kernel can safely sleep (block) during syscall execution to wait for resources.
    - The execution is fully preemptible, meaning another task may be scheduled during execution.
    - Because the new task may  then execute the same system call, code must be thoroughly reentrant to support symmetrical multiprocessing (SMP) and kernel preemption.
    - When the system call returns, control continues in system_call(), which ultimately switches to user-space and continues the execution of the user process.
- **Registration Pipeline:** Integrating a completed syscall requires three steps:
    1. Append the `sys_foo` pointer to the bottom of the architecture's `sys_call_table`. This neds to be done for each architeture tha tsupports the system call (which for most calls is all the architectures)
    2. Define the syscall number macro (e.g., `#define __NR_foo`) in `<asm/unistd.h>`.
    3. Compile the syscall implementation permanently into the core kernel image (e.g., inside `kernel/sys.c`).
- **Direct User-Space Access:** Without standard C library wrappers, user-space applications invoke syscalls using the `_syscalln()` macro suite.
    - $n$ denotes the argument count (0 through 6).
    - The macro requires $2 + 2 \times n$ parameters: the return type, the syscall name, and the type/name pair for each argument.
    - The macro expands into inline assembly that configures the registers and executes the trap.

Despite the relative ease of registering a new system call, the rigid constraints of the system call table heavily favor exploring alternative kernel interfaces.

### Alternatives to System Calls

Adding new system calls is generally discouraged by kernel developers due to interface rigidity and namespace pollution.

The pros of implementing a new interface as a syscall are as follows: n System calls are simple to implement and easy to use. n System call performance on Linux is fast

- **System Call Disadvantages:**
    - Requires an officially assigned syscall number.
    - The interface becomes immutable immediately upon release.
    - Requires duplicated registration work across every supported architecture.
    - Inaccessible directly from shell scripts or file utilities.
    - Difficult to maintain in external modules outside the master kernel tree.
- **Recommended Alternatives:**
    - **Device Nodes:** Implement a device with `read()`, `write()`, and `ioctl()` handlers.
    - **Sysfs:** Export kernel data or configurations as files in the appropriate `/sys` tree location.
    - **File Descriptors:** Represent obscure or complex interfaces (like semaphores) natively as manipulatable file descriptors.