Linux Kernel Portability

An operating system’s portability is governed by the tradeoff between abstract, machine-independent interfaces and highly customized, architecture-specific optimizations. The Linux kernel strikes a balance by maintaining architecture-independent C code for core interfaces and delegating performance-critical, low-level routines to architecture-specific assembly (located in the arch/ hierarchy).

Hardware architecture diversity requires the kernel to strictly abstract physical constraints, beginning with the fundamental unit of processor data: the word.

Word Size and Data Types

A word is the amount of data a processor can process in a single operation. It dictates the size of general-purpose registers, the width of the memory bus, and the virtual memory address space.

Standard C Type Sizes:
- char is strictly $1$ byte.
- short is strictly $16$ bits.
- int is typically $32$ bits, but this is not guaranteed by the C standard.
- long matches the system word size (defined by the BITS_PER_LONG macro).
- Pointers strictly match the system word size.
Operating System Data Models:
- LP64: long and pointer types are $64$ -bit; int remains $32$ -bit. This is the standard data model for $64$ -bit Linux architectures.
- ILP32: int, long, and pointer types are all $32$ -bit. This is the standard data model for $32$ -bit Linux architectures.
- LLP64: int and long are $32$ -bit; pointers are $64$ -bit (used by Windows, but not Linux).
Development Rules:
- Never assume sizeof(int) == sizeof(long).
- Never assume pointer size equals int size.

Standard C types vary by architecture, necessitating specialized kernel types to enforce explicit sizes and restrict direct access to complex data structures.

Opaque and Explicit Types

The kernel uses specialized types to mask internal architecture variations and ensure cross-platform compatibility.

Opaque Types:
- Hide internal structure and size formats to prevent improper casting or direct manipulation.
- Examples include pid_t (process IDs), atomic_t (atomic integers), dev_t, uid_t, and gid_t.
- Usage requires strict adherence to designated interfaces rather than standard C operators.
Explicitly Sized Types:
- Ensure exact bit widths for hardware, networking, and binary file interactions.
- Kernel-space definitions: s8, u8, s16, u16, s32, u32, s64, u64 (e.g., u32 is an unsigned $32$ -bit integer).
- User-space exported definitions: Prefixed with __ to protect namespaces (e.g., __u32).
Signedness of Characters:
- The char type is signed by default on most architectures (range $- 128$ to $127$ ), but unsigned by default on others like ARM (range $0$ to $255$ ).
- Variables storing explicit numeric values must be explicitly declared as signed char or unsigned char.

Defining exact data sizes ensures structural integrity, but mapping these structures into physical memory introduces strict boundary constraints.

Data Alignment and Structure Padding

Data alignment refers to placing data at memory addresses that are multiples of the data’s size. A data type of size $2^{n}$ bytes must reside at an address where the $n$ least significant bits are zero.

Alignment Rules:
- Base Types: Naturally aligned by the compiler. Accessing misaligned data triggers processor traps or severe performance degradation.
- Arrays: Inherit the alignment of their base type.
- Unions: Inherit the alignment of their largest included type.
- Structures: Aligned such that arrays of the structure maintain the natural alignment of every internal element.
Structure Padding:
- The compiler injects padding bytes between structure members to satisfy alignment constraints.
- Padding increases the memory footprint calculated by sizeof().
- ANSI C prohibits the compiler from automatically reordering structure members.
- Developers must manually reorder members (usually descending by size) to minimize padding waste, unless a specific hardware or binary layout is strictly required.

While alignment dictates the address boundaries of data, the internal arrangement of bytes within those boundaries relies entirely on processor byte ordering.

Byte Order (Endianness)

Byte ordering determines how multi-byte words are stored in physical memory.

Big-Endian:
- The most significant byte is stored at the lowest memory address.
- Standard for most RISC architectures.
Little-Endian:
- The least significant byte is stored at the lowest memory address.
- Standard for x86 architectures.
Kernel Byte Order Macros:
- The kernel defines __BIG_ENDIAN or __LITTLE_ENDIAN in <asm/byteorder.h>.
- Conversion macros safely transition data between processor-native ordering and specific target orderings: __cpu_to_be32(), __cpu_to_le32(), __be32_to_cpu(), and __le32_to_cpus().
- If the native byte order matches the target byte order, these macros compile to no-ops.

Beyond physical memory layout, architectural differences also mandate strict abstractions for spatial and temporal hardware configurations.

Time and Page Size

Hardcoded assumptions about system timing and memory paging immediately break code when ported across architectures.

Time:
- Timer interrupt frequencies vary wildly (e.g., $100 Hz$ to $1024 Hz$ ).
- Code must never hardcode interrupt frequencies.
- Time intervals must be scaled using the HZ macro (e.g., half a second is represented as HZ/2).
Page Size:
- Physical page sizes vary (e.g., $4 KB$ on x86-32, $8 KB$ on Alpha, $64 KB$ on certain configurations).
- Memory sizes must be designated using the PAGE_SIZE macro.
- Address shifts must be calculated using the PAGE_SHIFT macro.

Unifying these spatial and temporal abstractions across hardware requires mitigating unpredictable processor-level optimizations that execute concurrently.

Processor Ordering and Concurrency Assumptions

Writing portable code requires designing for the most pessimistic operational parameters across all supported architectures.

Processor Ordering:
- Architectures utilize varying degrees of processor ordering; some execute instructions strictly sequentially, while others aggressively reorder loads and stores for performance.
- Dependencies must be enforced using memory barriers (rmb(), wmb(), mb()) to ensure instruction commitment aligns with the code’s logical flow on all processors.
Universal Concurrency Assumptions:
- SMP Safety: Code must always assume it runs on a Symmetrical Multiprocessing system and utilize appropriate spinlocks or mutexes.
- Preempt Safety: Code must assume kernel preemption is enabled and utilize preemption disabling macros when handling localized processor data.
- High Memory Safety: Code must assume the presence of high memory (physical memory not permanently mapped into the kernel address space) and dynamically map pages using kmap() when required.

My Knowledge Base

Explorer

19 Portability