Overcoming Name Dependences with Register Renaming

Register renaming allows superscalar processors to dynamically allocate a large pool of physical registers to eliminate Write After Read (WAR) and Write After Write (WAW) hazards caused by name dependences.

Hardware Structures for Register Renaming

Unified Register File (URF): Implements a large set of physical registers (e.g., $p 0$ to $p 127$ ) that hold values for both architecturally visible registers and temporary values for uncommitted instructions.
Renaming Map: A table that dynamically maps the architectural registers specified by an instruction’s input operands to physical register numbers. This map is indexed and updated speculatively as instructions are issued in the processor’s front-end.
Architectural Map: A table identifying the physical register that corresponds to each architectural register based strictly on committed instructions. This structure resides in the pipeline back-end and provides the baseline state for exception and misprediction recovery.

To coordinate operations within these structures, each physical register must be tracked through a strict lifecycle of discrete states.

Physical Register States

Physical registers transition between four distinct states, encoded in hardware using three metadata bits representing Free, Architectural, and Ready status:

Free (1, 0, 0): The physical register is not in use and is available for allocation by a new instruction.
Used and Not Ready (0, 0, 0): The register has been allocated to a pending instruction that has not yet completed execution.
Used and Ready (0, 0, 1): The producing instruction has completed execution, making the computed value available for dependent instructions, but the instruction has not yet committed.
Architectural (0, 1, 1): The physical register stores the result of a committed instruction, is pointed to by the architectural map, and is permanently ready for use.

These state transitions are triggered by specific events as instructions advance through the execution pipeline.

Renaming Pipeline Stages

The lifecycle of a physical register aligns with the out-of-order execution stages of a superscalar processor:

Issue (In-order):
- The instruction looks up its input operands in the renaming map to find the physical registers storing the latest architectural values.
- The instruction allocates a new physical register ( $p_{j}$ ) for its output architectural register ( $x_{i}$ ).
- The renaming map entry for $x_{i}$ updates to point to $p_{j}$ .
- The state of $p_{j}$ is set to used but not ready.
Dispatch (Out-of-order): Once all physical registers holding input operands reach either the architectural or used and ready state, the instruction dispatches to an execution unit. The output register $p_{j}$ remains used but not ready.
Completion (Out-of-order): Upon finishing execution, the functional unit stores the result into $p_{j}$ and transitions its state to used and ready, allowing dependent instructions to proceed.
Commit (In-order):
- When the instruction reaches the head of the Reorder Buffer (ROB) without exceptions, $p_{j}$ transitions to the architectural state.
- The architectural map updates to associate $x_{i}$ with $p_{j}$ .
- The physical register previously mapped to $x_{i}$ ( $p_{k}$ ) transitions to the free state.
Recovery (In-order): If an exception or misprediction occurs at the head of the ROB, all physical registers in the used states are marked free. The speculative renaming map is discarded and overwritten by a bulk copy from the architectural map.

While this sequential state progression manages individual instructions, executing these steps concurrently for multiple instructions introduces severe hardware constraints.

Complexity in Superscalar Renaming

Superscalar processors must rename an entire bundle of $N$ instructions in a single clock cycle, creating significant logic complexity:

Multiported Structures: To issue $N$ instructions per cycle, the renaming map requires $2 N$ read ports to look up input operands and $N$ write ports to update output operands. The architectural map requires $M$ write ports to support $M$ instructions committing per cycle.
Intra-bundle Data Dependences: The renaming logic must detect and respect dependencies within the concurrently issued instruction bundle. If an instruction depends on a preceding instruction in the same bundle, it must bypass the standard renaming map lookup and directly target the physical register allocated by the preceding instruction.
Physical Register Deallocation: Determining exactly when to free a physical register is complex. To ensure safety against exceptions, processors typically delay freeing a physical register until the subsequent overwriting instruction formally commits, which trades longer register occupancy times for simplified hardware logic.

To mitigate the scalability limits of multiported maps and complex deallocation tracking, architectures can employ alternative register mapping topologies.

Alternative Implementations for Register Renaming

Processors implement varying structural approaches to bypass the limitations of a standard URF:

No Architectural Map: Eliminates the architectural map by forcing each instruction to record its destination’s old physical register mapping within its ROB entry at dispatch. On exception, the processor traverses the ROB backward to sequentially restore the old mappings into the renaming map.
Separate Physical and Architectural Registers: Divides registers into a 32-entry Architectural Register File (ARF) and a separate Rename Register File (RRF) for uncommitted instructions. At commit, output values copy from the RRF to the ARF. Exception recovery simply frees all RRF registers and resets the renaming map to point to the ARF.
Renaming Registers in the ROB: Embeds the temporary renaming registers directly into the ROB entries. Values are buffered in the ROB until commit, then copied to the ARF. Exceptions automatically clear the temporary buffers when the ROB flushes, though this increases the multi-porting burden on the ROB.
Renaming Table in the ROB: Eliminates the standalone renaming map entirely. New instructions must associatively search the ROB for the youngest instruction writing to their required architectural register, sourcing values directly from the matched ROB entry or the ARF. This requires highly complex, prioritized associative search logic across multiple instructions every cycle.

Would you like me to create a set of flashcards based on these register renaming concepts to help you study and review the material?

My Knowledge Base

Explorer

06 Register Renaming