Getting Started with the Linux Kernel

Obtaining and Managing the Source

  • Source code is distributed as complete tarballs or incremental patches from the official kernel repository.
  • Git operates as the primary distributed version control system designed to download and manage the kernel source.
  • Repositories are cloned using git clone and updated to the latest revision via git pull.
  • Tarball distributions default to the bzip2 compression format for optimal compression ratios.
  • Bzip2 archives are extracted using tar xvjf linux-x.y.z.tar.bz2, while gzip archives utilize tar xvzf linux-x.y.z.tar.gz.
  • Development must occur in a home directory rather than /usr/src/linux, as the latter is often linked to the compiled C library.
  • Incremental patches apply sequential updates to the source tree using patch -p1 < ../patch-x.y.z.
  • Once the source code is obtained and correctly staged in a user directory, understanding the structural layout of these files is necessary before configuration.

Kernel Source Tree Structure

  • The root directory divides the kernel into logical subsystems:
    • arch: Architecture-specific source code.
    • block: Block I/O layer.
    • crypto: Cryptographic API.
    • drivers: Device drivers.
    • firmware: Device firmware required by certain drivers.
    • fs: The Virtual Filesystem (VFS) and individual filesystems.
    • include: Kernel headers.
    • init: Kernel boot and initialization code.
    • ipc: Interprocess communication code.
    • kernel: Core subsystems, including the scheduler.
    • lib: Helper routines.
    • mm: Memory management subsystem and the virtual memory (VM) layer.
    • net: Networking subsystem.
    • security: Linux Security Module.
    • sound: Sound subsystem.
    • usr: Early user-space code utilized for initramfs.
  • Essential files in the root directory govern project metadata and licensing:
    • COPYING: The GNU GPL v2 license.
    • CREDITS: List of developers with significant code contributions.
    • MAINTAINERS: Individuals responsible for specific subsystems and drivers.
    • Makefile: The base kernel build script.
  • Familiarity with this directory structure informs the configuration process, where specific subsystems and drivers are selected for the compilation phase.

Configuring and Building the Kernel

  • Kernel features and drivers are toggled via configuration options prefixed with CONFIG_.
  • Configuration data types dictate the build outcome:
    • Booleans: Accept yes or no to enable or disable features natively.
    • Tristates: Accept yes, no, or module to compile code as a separate dynamically loadable object.
    • Strings/Integers: Specify values accessed internally as preprocessor macros without dictating build inclusions.
  • Configuration states are stored in a .config file at the root of the source tree.
  • Multiple utilities facilitate the configuration process:
    • make config: Interactive text-based command-line utility.
    • make menuconfig: ncurses-based graphical utility.
    • make gconfig: gtk+-based graphical utility.
    • make defconfig: Generates a configuration based on architecture defaults.
    • make oldconfig: Validates and updates existing configuration files.
  • Configurations can be cloned from a running kernel via zcat /proc/config.gz > .config if CONFIG_IKCONFIG_PROC is enabled.
  • The make command automatically maintains the dependency tree and executes the build.
  • Build noise is minimized by redirecting standard output via make > /dev/null.
  • Parallel execution accelerates the build process by spawning concurrent jobs with make -jn, where represents the number of jobs.
  • A successfully configured and built kernel produces an executable image and loadable modules that must be installed into the system alongside mapping tables.

Installing the Kernel

  • Kernel image installation methodologies depend entirely on the architecture and boot loader.
  • Module installation is architecture-independent and automated via make modules_install, which places compiled modules into /lib/modules.
  • The build process generates a System.map file containing a symbol lookup table that translates memory addresses to function and variable names.
  • With the compiled kernel installed, modifying or developing new kernel code requires navigating the strict execution environment in which this compiled code runs.

Kernel Execution Environment Constraints

  • The kernel executes without access to the standard C library or standard C headers.
  • In-kernel implementations replace standard library functions, utilizing files like lib/string.c for string manipulation via <linux/string.h>.
  • printk() replaces printf(), accepting a priority flag string literal (e.g., KERN_ERR) that syslogd uses to route displayed messages.
  • Header files belong exclusively to the kernel source tree, partitioned into base headers at include/ and architecture-specific headers at arch/<architecture>/include/asm/.
  • Memory protection is absent in the kernel:
    • Illegal memory access triggers an oops, a major kernel error.
    • Kernel memory is not pageable, meaning every consumed byte directly reduces available physical memory.
  • Floating-point operations are severely restricted due to the manual overhead required to save and restore floating-point registers.
  • Processes in the kernel execute on a small, fixed-size stack.
    • Stack size is strictly absolute, historically 8KB for 32-bit architectures and 16KB for 64-bit architectures, with x86 allowing a 4KB or 8KB configuration.
  • These severe memory and stack constraints dictate the usage of specific compiler extensions to optimize execution paths.

GNU C Compiler Extensions

  • The kernel utilizes ISO C99 and GNU C extensions, necessitating gcc version 3.2 or later, with 4.4 recommended.
  • Inline functions eliminate invocation overhead by inserting the function contents directly at the call site.
    • Declared with static inline inside header files to ensure type safety and prevent the creation of an exported function.
    • Increased memory consumption and instruction cache footprint limit their viable use to small, time-critical functions.
  • Inline assembly utilizes the asm() compiler directive to embed assembly instructions directly within C functions for low-level architecture fast paths.
  • Branch annotation optimizes conditional branches through compiler directives.
    • likely() and unlikely() macros instruct the compiler to optimize for branches known overwhelmingly a priori.
    • Mismarking branches incurs a performance penalty.
  • While compiler optimizations handle instruction-level speed, system-level design demands rigorous concurrency control to handle simultaneous code execution.

Synchronization, Concurrency, and Portability

  • The kernel is highly susceptible to race conditions, requiring synchronization primitives like spinlocks and semaphores to protect shared resources.
  • Concurrent access stems from four primary vectors:
    • Preemptive multitasking allows the scheduler to arbitrarily suspend and resume processes.
    • Symmetrical multiprocessing (SMP) enables multiple processors to execute kernel code simultaneously.
    • Asynchronous interrupts force execution flow changes amidst active resource access.
    • Kernel preemption permits higher-priority tasks to interrupt active kernel code.
  • Portability guarantees that architecture-independent C code successfully compiles across diverse systems.
  • Code must segregate architecture-dependent logic, remain endian-neutral, maintain 64-bit cleanliness, and avoid hardcoded word or page sizes.
  • These synchronization and portability requirements establish the foundational rules for interacting with individual kernel subsystems.