RV Datapath

Logic Design

Processor hardware consists of two kinds of elements.

Combinational elements: Logic blocks (ALUs, adders, multiplexors) whose output depends only on current inputs. No internal storage — same inputs always produce the same output.
State elements: Memory components (PC, register file, instruction memory, data memory) that store values across clock cycles. Restoring all state elements restores the full machine state.

Clocking methodology: All state element writes occur strictly on the rising clock edge. Combinational logic operates freely between edges. This allows a state element to be read, its value passed through combinational logic, and the result written back to the same element — all within one cycle, without race conditions.

$Clock_{period} \geq t_{prop} + t_{combinational} + t_{setup}$

Clock skew: The difference in arrival time of the clock edge at two state elements. The clock period must be padded to account for maximum skew.

Control vs data signals: Data signals carry values being processed (register contents, ALU results, memory data, PC). Control signals tell hardware what to do (RegWrite, MemRead, ALUSrc, etc.) — they are asserted (1) or deasserted (0) based on the current instruction.

Datapath

The datapath is built by asking what hardware each instruction class needs, then combining the pieces with multiplexors to share hardware across instruction types.

Instruction fetch (all instructions):

PC addresses instruction memory; a hardwired adder computes PC + 4 for the next sequential instruction.
RISC-V instructions are 32 bits (4 bytes), so sequential instructions are always 4 apart.

R-type (add, sub, and, or):

Register file reads two source registers; ALU performs the operation; result writes back to the destination register.
Two separate memories are required: instruction memory and data memory. A load must read both in the same cycle — a single-ported unified memory would cause a structural conflict.

Load / Store (ld, sd):

ImmGen sign-extends the 12-bit offset to 64 bits.
ALU computes base register + offset as the memory address.
Load reads data memory and writes the result to a register; store reads a second register and writes it to data memory.

Branch (beq):

ALU subtracts the two source registers and asserts a Zero signal if they are equal.
A dedicated adder computes the branch target: PC + (sign-extended offset << 1). The offset is shifted left by 1 because branch offsets encode half-word counts, giving a ±4 KiB range with 1-bit better resolution.
A multiplexor selects the next PC: PCSrc = Branch AND Zero. If both are asserted, the branch is taken.

Multiplexors that unify the datapath:

Mux	Input 0	Input 1	Control
ALU second input	Register value	Sign-extended immediate	`ALUSrc`
Register write data	ALU result	Data memory output	`MemtoReg`
Next PC	PC + 4	Branch target	`PCSrc`

The complete assembled single-cycle datapath with control signals:

Control Unit

The control unit maps the instruction opcode to the signals that steer the datapath. ALU control is decoupled into two levels to keep the main control simple.

Main control decodes the 7-bit opcode into a coarse ALUOp plus all non-ALU signals:

Instruction	ALUSrc	MemtoReg	RegWrite	MemRead	MemWrite	Branch	ALUOp
R-type	0	0	1	0	0	0	10
`ld`	1	1	1	1	0	0	00
`sd`	1	X	0	0	1	0	00
`beq`	0	X	0	0	0	1	01

X = don’t care (signal value is irrelevant because the affected unit is not used).

ALU control refines ALUOp using funct3 and funct7:

ALUOp	Meaning	ALU action
`00`	Load / Store	Always add (address calculation)
`01`	Branch	Always subtract (equality test)
`10`	R-type / I-type	Determined by funct3 / funct7

The final 4-bit ALU operation signal:

ALU control	Operation
`0000`	AND
`0001`	OR
`0010`	Add
`0110`	Subtract

Single-Cycle Execution

In a single-cycle implementation every instruction starts and finishes within one clock cycle. CPI is exactly 1, but the clock period is fixed by the slowest instruction — a load, which traverses:

Instruction memory → Register file → ALU → Data memory → Register file

Every other instruction, including a simple add, must wait for this same long cycle. This violates make the common case fast: the fast common case is penalised by the slow uncommon case. Single-cycle implementations are therefore not used in practice; pipelining overlaps instruction execution to amortise this cost.

My Knowledge Base

Explorer

5 RV Datapath

RV Datapath

Logic Design

Datapath

Control Unit

Single-Cycle Execution