Instruction Set Principles

Instruction set architecture (ISA) serves as the definitive interface between software and hardware, defining the visible structures and operations of a processor.

Classifying Instruction Set Architectures

Internal storage dictates the core structure of an instruction set and defines how operands are accessed and manipulated.

Internal Storage Classes:

Stack: Implicit operands located at the top of the stack. Hardware evaluates expressions in a strict, unalterable order.
Accumulator: One implicit operand acts as both a source and the destination for the arithmetic-logical unit (ALU).
General-Purpose Register (GPR): Uses explicit operands, mapping variables directly to fast internal registers.

GPR Sub-classifications:

Register-Memory: Instructions can access memory directly as part of an ALU operation.
Load-Store (Register-Register): Memory is accessed exclusively through dedicated load and store instructions; all ALU operations use registers.
Memory-Memory: All operands are kept in memory (obsolete in modern architectures).

Advantages of GPR Architectures:

Registers execute faster than memory.
Allocating variables to registers reduces total memory traffic.
Code density improves because register addresses require fewer bits than memory addresses.
Compilers can efficiently reorder instruction execution when operands reside in independent registers.

The classification of internal storage defines where data resides, establishing the rules for how that data must be addressed in memory.

Type and Size of Operands

Operand typing establishes the numerical interpretation and hardware representation of data fetched from storage.

Operand Representation:
- Type is overwhelmingly designated through the instruction opcode rather than hardware data tags.
Standard Data Types:
- Integers: Represented as two’s complement binary numbers.
- Characters: Encoded using 8-bit ASCII or 16-bit Unicode.
- Floating Point: Follows the IEEE 754 standard for single-precision (32-bit) and double-precision (64-bit) representations.
- Decimal: Binary-coded decimal (BCD) packs two decimal digits per byte to eliminate binary approximation errors in financial calculations.
Sign Extension: Loading a narrower type (byte, halfword) into a full-width register requires filling the additional upper bits. Signed loads replicate the sign bit into all upper bits, preserving the two’s-complement value. Unsigned loads zero-fill the upper bits.

With operand sizes and types formally defined, the ISA provides operators to compute and manipulate these structures.

Operations in the Instruction Set

Instruction operators represent the fundamental computational capabilities provided by the processor.

Operator Categories:
- Arithmetic and Logical: Integer math, bitwise logic, and shifts.
- Data Transfer: Memory loads and stores.
- Control: Branches, jumps, and procedure calls.
- System: OS calls and virtual memory management.
- Domain-Specific: Floating-point, string manipulation, and graphics/media instructions.
Instruction Usage:
- The most frequently executed instructions are simple operations.
- Ten basic instructions (loads, conditional branches, compares, stores, adds, logical ANDs, subtracts, register moves, calls, and returns) account for 96% of all executed instructions in typical integer programs.

Beyond simple sequential operations, architectures must dynamically alter execution paths using control flow instructions.

Memory Addressing

Memory addressing defines how memory locations are calculated, interpreted, and restricted by the architecture.

Interpretation and Alignment:

Most architectures are byte-addressed, operating on 8-bit bytes, 16-bit half words, 32-bit words, and 64-bit double words.
Byte Ordering:
- Little Endian stores the least-significant byte at the lowest address;
- Big Endian stores the most-significant byte at the lowest address.
An object of size $s$ bytes at byte address $A$ is aligned if $A mod s = 0$ .
Misaligned memory accesses force the hardware to execute multiple aligned memory references, increasing complexity and degrading performance.

Addressing Modes:

Mode	Example instruction	When used
Register	`Add Rd, Rs`	Values already in registers
Immediate	`Add Rd, 1`	Small constant values
Displacement	`Add Rd, 100(Rs)`	Data at an offset from a base
Register indirect	`Add Rd, (Rs)`	Memory address stored in register
Indexed	`Add Rd, (Rs + Ri)`	Array access with index register
Direct / absolute	`Add Rd, (100)`	Data at a fixed memory address
Memory indirect	`Add Rd, @(Rs)`	Address found through memory
Autoincrement	`Add Rd, (Rs)+`	Step forward through memory
Autodecrement	`Add Rd, -(Rs)`	Step backward or stack operations
Scaled	`Add Rd, 100(Rs)[Ri]`	Array access with element size

Usage Statistics: - Displacement and immediate addressing dominate, accounting for 75% to 99% of all memory accesses. - Displacement offsets of 12 to 16 bits successfully capture 75% to 99% of displacement requirements. - Immediate fields of 8 to 16 bits capture 50% to 80% of immediate requirements.

Once memory addresses are reliably calculated, the architecture must define the exact structure and meaning of the data fetched.

Instructions for Control Flow

Control flow instructions disrupt linear execution, requiring specialized mechanisms for evaluating conditions and calculating target addresses.

Classes of Control Flow:
- Conditional branches, jumps, procedure calls, and procedure returns. Conditional branches are the most frequent.
Destination Addressing:
- PC-Relative Addressing: Adds a displacement to the Program Counter (PC) to specify the target. Ensures position independence and reduces encoding size.
- Register Indirect Jumps: Uses a register to hold the target address. Necessary for targets unknown at compile time, such as procedure returns, switch statements, virtual functions, and dynamically shared libraries.
Branch Condition Evaluation:
- Condition Codes (CC): Special bits set implicitly by ALU operations. Limits instruction reordering.
- Condition Register: Tests an arbitrary register for equality or zero.
- Compare and Branch: Bundles the comparison and branch into a single instruction, though it may lengthen the critical path.
Procedure Invocation:
- Caller Saving: The calling procedure preserves the registers it needs across the call.
- Callee Saving: The called procedure saves and restores any registers it overwrites.
- Application Binary Interfaces (ABIs) dictate standard conventions combining caller-saved and callee-saved registers to minimize memory traffic.

Translating these operations, addressing modes, and control flows into hardware-executable commands requires a strategic instruction encoding scheme.

Stack and Memory Layout

Procedure calls require temporary storage beyond the register file.

Stack: A Last-In-First-Out memory structure used to spill registers and hold local variables across calls. The stack pointer tracks the current top and decrements on push (allocation) and increments on pop (deallocation).
Procedure Frame: The contiguous stack region owned by one procedure invocation, containing its saved registers and local data. A frame pointer provides a stable reference into the frame independent of subsequent stack pointer movement during execution.
Memory Segment Map: A process’s address space is divided into four conventional regions:
- Text: Machine instructions and statically linked code, residing at low addresses.
- Static Data: Global variables and string constants, accessed via a dedicated global pointer register.
- Heap: Dynamically allocated memory that grows toward higher addresses.
- Stack: Grows downward from high addresses toward the heap.

Encoding an Instruction Set

Instruction encoding maps the ISA components into a binary format, balancing code density against processor decoding complexity.

Encoding Strategies:
- Variable Length: Instructions vary from 1 to 17 bytes (e.g., 80x86). Yields the highest code density but significantly complicates pipeline decoding.
- Fixed Length: All instructions are a single size (e.g., 32-bit in RISC-V, ARM). Simplifies decoding and pipelining but increases total code size.
- Hybrid: Offers multiple fixed formats (e.g., 16-bit and 32-bit). Achieves high density without extreme decoding complexity.
Compression Techniques:
- Extensions like RV32IC (RISC-V Compressed) encode common operations and small immediates into 16-bit instructions, acting as if the instruction cache is 25% larger.

Because instruction encoding formats act as the direct target for software compilation, hardware design is inextricably linked to compiler behavior.

Stored-Program Concept

Instructions and data reside together in the same memory as binary numbers. This unification means identical hardware is used to fetch instructions and to load/store data operands. It also allows programs to be generated, modified, and analyzed by other programs — the foundation that makes compilers, linkers, and debuggers possible.

Foundational Design Principles

Three recurring trade-off observations guide modern ISA design:

Simplicity favors regularity: Fixed-format instructions with a uniform operand count simplify decoding and reduce control logic. Requiring exactly three operands for all arithmetic instructions avoids variable-operand hardware complexity.
Smaller is faster: A small, fixed register set is preferable to a large one. Additional registers increase multiplexer and register-file complexity, lengthening the clock cycle. The common choice of 32 registers balances allocation flexibility against hardware cost.
Good design demands good compromises: Uniform instruction length simplifies decoding but conflicts with encoding large constants or full memory addresses. The resolution is multiple instruction formats, each optimized for its operation class while maintaining a fixed total width.
Make the common case fast: The most frequently executed operations should be the cheapest. Embedding small constants directly in instructions (immediate operands) bypasses slow memory loads for the common case where the constant is known at compile time.

Synchronization

Parallel hardware threads accessing shared memory introduce data races: nondeterministic outcomes when reads and writes to the same address overlap without coordination. Correct synchronization requires atomic operations — a memory read and subsequent write bound into a single, uninterruptible sequence.

Load-Reserved / Store-Conditional (LR/SC): Two paired instructions. The load-reserved reads a location and registers a reservation. The store-conditional writes to that address only if the reservation is still valid; on failure it writes a nonzero error code and software retries. This synthesizes any synchronization primitive (compare-and-swap, test-and-set) without requiring a three-operand instruction.
Atomic Memory Operations (AMOs): Execute a full read-modify-write (add, swap, AND, OR, min, max, etc.) as a single indivisible bus transaction. AMOs eliminate the software retry loop and scale more efficiently in large multiprocessor systems where frequent reservation invalidations would make LR/SC retry rates unacceptable.

My Knowledge Base

Explorer

1 ISA Principles