Instruction Set Principles

Instruction set architecture (ISA) serves as the definitive interface between software and hardware, defining the visible structures and operations of a processor.

Classifying Instruction Set Architectures

Internal storage dictates the core structure of an instruction set and defines how operands are accessed and manipulated.

Internal Storage Classes:
- Stack: Implicit operands located at the top of the stack. Hardware evaluates expressions in a strict, unalterable order.
- Accumulator: One implicit operand acts as both a source and the destination for the arithmetic-logical unit (ALU).
- General-Purpose Register (GPR): Uses explicit operands, mapping variables directly to fast internal registers.
GPR Sub-classifications:
- Register-Memory: Instructions can access memory directly as part of an ALU operation.
- Load-Store (Register-Register): Memory is accessed exclusively through dedicated load and store instructions; all ALU operations use registers.
- Memory-Memory: All operands are kept in memory (obsolete in modern architectures).
Advantages of GPR Architectures:
- Registers execute faster than memory.
- Allocating variables to registers reduces total memory traffic.
- Code density improves because register addresses require fewer bits than memory addresses.
- Compilers can efficiently reorder instruction execution when operands reside in independent registers.

The classification of internal storage defines where data resides, establishing the rules for how that data must be addressed in memory.

Memory Addressing

Memory addressing defines how memory locations are calculated, interpreted, and restricted by the architecture.

Interpretation and Alignment:
- Most architectures are byte-addressed, operating on 8-bit bytes, 16-bit half words, 32-bit words, and 64-bit double words.
- Byte Ordering: Little Endian stores the least-significant byte at the lowest address; Big Endian stores the most-significant byte at the lowest address.
- Alignment Restrictions: An object of size $s$ bytes at byte address $A$ is aligned if $A mod s = 0$ .
- Misaligned memory accesses force the hardware to execute multiple aligned memory references, increasing complexity and degrading performance.
Addressing Modes:
- Displacement: Adds a constant offset to a base register. Used extensively for accessing local variables.
- Immediate: Embeds a constant value directly within the instruction stream.
- Register Indirect: Uses the value in a register as the memory address.
- Other Modes: Indexed, direct/absolute, memory indirect, autoincrement, autodecrement, and scaled.
Usage Statistics:
- Displacement and immediate addressing dominate, accounting for 75% to 99% of all memory accesses.
- Displacement offsets of 12 to 16 bits successfully capture 75% to 99% of displacement requirements.
- Immediate fields of 8 to 16 bits capture 50% to 80% of immediate requirements.

Once memory addresses are reliably calculated, the architecture must define the exact structure and meaning of the data fetched.

Type and Size of Operands

Operand typing establishes the numerical interpretation and hardware representation of data fetched from storage.

Operand Representation:
- Type is overwhelmingly designated through the instruction opcode rather than hardware data tags.
Standard Data Types:
- Integers: Represented as two’s complement binary numbers.
- Characters: Encoded using 8-bit ASCII or 16-bit Unicode.
- Floating Point: Follows the IEEE 754 standard for single-precision (32-bit) and double-precision (64-bit) representations.
- Decimal: Binary-coded decimal (BCD) packs two decimal digits per byte to eliminate binary approximation errors in financial calculations.

With operand sizes and types formally defined, the ISA provides operators to compute and manipulate these structures.

Operations in the Instruction Set

Instruction operators represent the fundamental computational capabilities provided by the processor.

Operator Categories:
- Arithmetic and Logical: Integer math, bitwise logic, and shifts.
- Data Transfer: Memory loads and stores.
- Control: Branches, jumps, and procedure calls.
- System: OS calls and virtual memory management.
- Domain-Specific: Floating-point, string manipulation, and graphics/media instructions.
Instruction Usage:
- The most frequently executed instructions are simple operations.
- Ten basic instructions (loads, conditional branches, compares, stores, adds, logical ANDs, subtracts, register moves, calls, and returns) account for 96% of all executed instructions in typical integer programs.

Beyond simple sequential operations, architectures must dynamically alter execution paths using control flow instructions.

Instructions for Control Flow

Control flow instructions disrupt linear execution, requiring specialized mechanisms for evaluating conditions and calculating target addresses.

Classes of Control Flow:
- Conditional branches, jumps, procedure calls, and procedure returns. Conditional branches are the most frequent.
Destination Addressing:
- PC-Relative Addressing: Adds a displacement to the Program Counter (PC) to specify the target. Ensures position independence and reduces encoding size.
- Register Indirect Jumps: Uses a register to hold the target address. Necessary for targets unknown at compile time, such as procedure returns, switch statements, virtual functions, and dynamically shared libraries.
Branch Condition Evaluation:
- Condition Codes (CC): Special bits set implicitly by ALU operations. Limits instruction reordering.
- Condition Register: Tests an arbitrary register for equality or zero.
- Compare and Branch: Bundles the comparison and branch into a single instruction, though it may lengthen the critical path.
Procedure Invocation:
- Caller Saving: The calling procedure preserves the registers it needs across the call.
- Callee Saving: The called procedure saves and restores any registers it overwrites.
- Application Binary Interfaces (ABIs) dictate standard conventions combining caller-saved and callee-saved registers to minimize memory traffic.

Translating these operations, addressing modes, and control flows into hardware-executable commands requires a strategic instruction encoding scheme.

Encoding an Instruction Set

Instruction encoding maps the ISA components into a binary format, balancing code density against processor decoding complexity.

Encoding Strategies:
- Variable Length: Instructions vary from 1 to 17 bytes (e.g., 80x86). Yields the highest code density but significantly complicates pipeline decoding.
- Fixed Length: All instructions are a single size (e.g., 32-bit in RISC-V, ARM). Simplifies decoding and pipelining but increases total code size.
- Hybrid: Offers multiple fixed formats (e.g., 16-bit and 32-bit). Achieves high density without extreme decoding complexity.
Compression Techniques:
- Extensions like RV32IC (RISC-V Compressed) encode common operations and small immediates into 16-bit instructions, acting as if the instruction cache is 25% larger.

Because instruction encoding formats act as the direct target for software compilation, hardware design is inextricably linked to compiler behavior.

Cross-Cutting Issues: The Role of Compilers

Modern ISAs are fundamentally compiler targets, meaning architectural design must align with compiler optimization strategies.

Compiler Structure:
- Front End: Language-dependent parsing.
- High-Level Optimizer: Procedure inlining and loop transformations.
- Global Optimizer: Global and local optimizations, plus register allocation.
- Code Generator: Machine-dependent instruction selection.
Register Allocation:
- Graph coloring maps an unlimited number of virtual registers to a limited set of physical registers.
- Graph coloring is highly effective but requires all registers to be identical, unreserved, and orthogonal.
Architectural Guidelines for Compilers:
- Provide Regularity (Orthogonality): Operations, data types, and addressing modes should be independent and universally applicable to avoid limiting compiler choices.
- Provide Primitives, Not Solutions: Avoid highly complex instructions tailored for a single high-level language feature, as compilers struggle to match exact use cases.
- Simplify Trade-offs: Make the performance costs of alternative code sequences obvious to the compiler.
- Do Not Bind Constants at Runtime: Do not force the hardware to interpret values dynamically if the compiler can resolve them statically.

Applying these compiler-friendly guidelines and foundational principles results in streamlined, highly efficient architectures like RISC-V.

Putting It All Together: The RISC-V Architecture

RISC-V is an open-standard load-store architecture that epitomizes minimalist, compiler-optimized design.

Instruction Set Organization:
- Base sets: RV32I, RV32E, RV64I, RV128I.
- Standard Extensions: M (multiply/divide), A (atomic), F (single-precision FP), D (double-precision FP).
- RV64G refers to the combined RV64IMAFD instruction set.
Registers:
- Provides 32 64-bit General-Purpose Registers (x0–x31), where x0 is hardwired to 0.
- Provides 32 Floating-Point Registers (f0–f31).
Memory Addressing:
- Byte-addressable, Little Endian load-store architecture.
- Uses only two addressing modes: immediate and displacement (with 12-bit fields). Register indirect is achieved using a 0 displacement.
Instruction Formats:
- Fixed 32-bit encoding across 4 primary formats (R-type, I-type, S-type, U-type).
- Register specifiers (rs1, rs2, rd) are located in identical bit positions across all formats to simplify decoding.
Control Flow:
- All conditional branches are PC-relative and utilize full comparisons between two registers (e.g., equal, less than).
- Unconditional jumps utilize jump and link (PC-relative) or jump and link register (register indirect) for dynamic targets.

Despite the clear benefits of robust principles like those implemented in RISC-V, the history of ISA design is fraught with predictable missteps.

Fallacies and Pitfalls

Flawed architectural decisions frequently stem from misunderstanding the hardware-software interface or over-optimizing for the wrong metrics.

Pitfall: Semantic Clash:
- Designing complex instructions specifically for high-level language structures often results in overkill. For instance, the VAX CALLS instruction mandates heavy overhead (stack alignment, mask interpretation, clearing condition codes) that the vast majority of simple procedure calls do not need.
Fallacy: A “Typical” Program Exists:
- Designing an ISA around a single synthetic benchmark is flawed; dynamic instruction distribution varies wildly across different programs.
Pitfall: Hardware Optimization Without Compiler Context:
- Innovating at the ISA level to reduce code size is futile if the compiler’s own optimization strategies alter code size by much larger factors.
Fallacy: Architectures With Flaws Cannot Be Successful:
- The 80x86 architecture succeeded massively despite severe architectural flaws (e.g., execution stacks, segmented memory). Its success was driven by immense commercial binary compatibility, high-volume production, and hardware that translates the messy 80x86 instructions into efficient internal RISC micro-operations.

My Knowledge Base

Explorer

ISA