Getting Started with the Linux Kernel

Source code is distributed as complete tarballs or incremental patches from the official kernel repository.
Git operates as the primary distributed version control system designed to download and manage the kernel source.
Repositories are cloned using git clone and updated to the latest revision via git pull.
Tarball distributions default to the bzip2 compression format for optimal compression ratios.
Bzip2 archives are extracted using tar xvjf linux-x.y.z.tar.bz2, while gzip archives utilize tar xvzf linux-x.y.z.tar.gz.
Development must occur in a home directory rather than /usr/src/linux, as the latter is often linked to the compiled C library.
Incremental patches apply sequential updates to the source tree using patch -p1 < ../patch-x.y.z.
Once the source code is obtained and correctly staged in a user directory, understanding the structural layout of these files is necessary before configuration.

The root directory divides the kernel into logical subsystems:
- arch: Architecture-specific source code.
- block: Block I/O layer.
- crypto: Cryptographic API.
- drivers: Device drivers.
- firmware: Device firmware required by certain drivers.
- fs: The Virtual Filesystem (VFS) and individual filesystems.
- include: Kernel headers.
- init: Kernel boot and initialization code.
- ipc: Interprocess communication code.
- kernel: Core subsystems, including the scheduler.
- lib: Helper routines.
- mm: Memory management subsystem and the virtual memory (VM) layer.
- net: Networking subsystem.
- security: Linux Security Module.
- sound: Sound subsystem.
- usr: Early user-space code utilized for initramfs.
Essential files in the root directory govern project metadata and licensing:
- COPYING: The GNU GPL v2 license.
- CREDITS: List of developers with significant code contributions.
- MAINTAINERS: Individuals responsible for specific subsystems and drivers.
- Makefile: The base kernel build script.
Familiarity with this directory structure informs the configuration process, where specific subsystems and drivers are selected for the compilation phase.

Kernel features and drivers are toggled via configuration options prefixed with CONFIG_.
Configuration data types dictate the build outcome:
- Booleans: Accept yes or no to enable or disable features natively.
- Tristates: Accept yes, no, or module to compile code as a separate dynamically loadable object.
- Strings/Integers: Specify values accessed internally as preprocessor macros without dictating build inclusions.
Configuration states are stored in a .config file at the root of the source tree.
Multiple utilities facilitate the configuration process:
- make config: Interactive text-based command-line utility.
- make menuconfig: ncurses-based graphical utility.
- make gconfig: gtk+-based graphical utility.
- make defconfig: Generates a configuration based on architecture defaults.
- make oldconfig: Validates and updates existing configuration files.
Configurations can be cloned from a running kernel via zcat /proc/config.gz > .config if CONFIG_IKCONFIG_PROC is enabled.
The make command automatically maintains the dependency tree and executes the build.
Build noise is minimized by redirecting standard output via make > /dev/null.
Parallel execution accelerates the build process by spawning concurrent jobs with make -jn, where $n$ represents the number of jobs.
A successfully configured and built kernel produces an executable image and loadable modules that must be installed into the system alongside mapping tables.

Kernel image installation methodologies depend entirely on the architecture and boot loader.
Module installation is architecture-independent and automated via make modules_install, which places compiled modules into /lib/modules.
The build process generates a System.map file containing a symbol lookup table that translates memory addresses to function and variable names.
With the compiled kernel installed, modifying or developing new kernel code requires navigating the strict execution environment in which this compiled code runs.

The kernel executes without access to the standard C library or standard C headers.
In-kernel implementations replace standard library functions, utilizing files like lib/string.c for string manipulation via <linux/string.h>.
printk() replaces printf(), accepting a priority flag string literal (e.g., KERN_ERR) that syslogd uses to route displayed messages.
Header files belong exclusively to the kernel source tree, partitioned into base headers at include/ and architecture-specific headers at arch/<architecture>/include/asm/.
Memory protection is absent in the kernel:
- Illegal memory access triggers an oops, a major kernel error.
- Kernel memory is not pageable, meaning every consumed byte directly reduces available physical memory.
Floating-point operations are severely restricted due to the manual overhead required to save and restore floating-point registers.
Processes in the kernel execute on a small, fixed-size stack.
- Stack size is strictly absolute, historically 8KB for 32-bit architectures and 16KB for 64-bit architectures, with x86 allowing a 4KB or 8KB configuration.
These severe memory and stack constraints dictate the usage of specific compiler extensions to optimize execution paths.

The kernel utilizes ISO C99 and GNU C extensions, necessitating gcc version 3.2 or later, with 4.4 recommended.
Inline functions eliminate invocation overhead by inserting the function contents directly at the call site.
- Declared with static inline inside header files to ensure type safety and prevent the creation of an exported function.
- Increased memory consumption and instruction cache footprint limit their viable use to small, time-critical functions.
Inline assembly utilizes the asm() compiler directive to embed assembly instructions directly within C functions for low-level architecture fast paths.
Branch annotation optimizes conditional branches through compiler directives.
- likely() and unlikely() macros instruct the compiler to optimize for branches known overwhelmingly a priori.
- Mismarking branches incurs a performance penalty.
While compiler optimizations handle instruction-level speed, system-level design demands rigorous concurrency control to handle simultaneous code execution.

The kernel is highly susceptible to race conditions, requiring synchronization primitives like spinlocks and semaphores to protect shared resources.
Concurrent access stems from four primary vectors:
- Preemptive multitasking allows the scheduler to arbitrarily suspend and resume processes.
- Symmetrical multiprocessing (SMP) enables multiple processors to execute kernel code simultaneously.
- Asynchronous interrupts force execution flow changes amidst active resource access.
- Kernel preemption permits higher-priority tasks to interrupt active kernel code.
Portability guarantees that architecture-independent C code successfully compiles across diverse systems.
Code must segregate architecture-dependent logic, remain endian-neutral, maintain 64-bit cleanliness, and avoid hardcoded word or page sizes.
These synchronization and portability requirements establish the foundational rules for interacting with individual kernel subsystems.

My Knowledge Base