The RV32F and RV32D extensions provide 32-bit (single-precision) and 64-bit (double-precision) floating-point capabilities.
These extensions strictly adhere to the IEEE 754-2008 floating-point standard.
Implementing floating-point capabilities requires dedicated hardware structures to manage computation state without polluting the primary integer datapath.
Floating-Point Registers and State
The architecture introduces 32 independent floating-point registers, designated f0 through f31.
Doubling register capacity and bandwidth via a separate register file improves performance without increasing the size of the register specifier in the instruction format.
The f0 register is alterable and holds standard data, unlike the integer x0 register which is hardwired to zero.
When both extensions are implemented, single-precision operations utilize only the lower 32 bits of the 64-bit f registers.
The floating-point control and status register (fcsr) maintains the global arithmetic configuration and records boundary conditions.
Rounding Modes (frm): Defines the mathematical rounding behavior.
Round to nearest, ties to even (RNE) serves as the most accurate and common default.
Alternative modes include round towards zero (rtz), round down towards −∞ (rdn), round up towards +∞ (rup), and round to nearest, ties to max magnitude (rmm).
Static rounding allows individual instructions to override the dynamic fcsr rounding mode via an optional argument, optimizing sequences that require a specific rounding behavior.
Accrued Exception Flags (fflags): Five flags indicate runtime faults: Invalid Operation (NV), Divide by Zero (DZ), Overflow (OF), Underflow (UF), and Inexact (NX).
To utilize these dedicated registers, the architecture defines specific mechanisms to transfer data directly to and from memory and the integer datapath.
Memory Access and Register Transfers
Floating-point load and store instructions utilize the identical base addressing mode as integer operations, calculating the effective address by adding a 12-bit sign-extended immediate to a base register.
Loads:flw retrieves a 32-bit word; fld retrieves a 64-bit doubleword.
Stores:fsw writes a 32-bit word; fsd writes a 64-bit doubleword.
Instructions exist to transfer data directly between integer (x) and floating-point (f) registers without utilizing memory as an intermediary.
fmv.x.w copies a 32-bit single-precision value from an f register into an x register.
fmv.w.x copies a 32-bit integer value from an x register into an f register.
Once data is successfully loaded or transferred into the f registers, it is manipulated via specialized mathematical instructions.
Arithmetic Operations and Fused Computations
Standard arithmetic instructions support both precision levels: addition (fadd.s/d), subtraction (fsub.s/d), multiplication (fmul.s/d), division (fdiv.s/d), and square root (fsqrt.s/d).
Unlike integer multiplication, the size of a floating-point product is mathematically identical to the size of its source operands.
Minimum (fmin.s/d) and maximum (fmax.s/d) operations isolate the smaller or larger of two source operands and write the result directly to the destination register, eliminating the need for branching.
Fused Multiply-Add: Operations that require sequential multiplication and addition (or subtraction) utilize fused instructions for performance and precision gains.
Variants include fmadd (multiply then add), fmsub (multiply then subtract), fnmadd (negate product then add), and fnmsub (negate product then subtract).
These instructions utilize the specialized R4 instruction format to accommodate four register specifiers (three sources, one destination).
Fused instructions execute faster and maintain higher accuracy by performing a single rounding operation at the end of the full calculation, rather than rounding after both the multiplication and the addition.
Beyond raw mathematical transformation, calculation results must frequently be evaluated to determine subsequent program execution paths.
Comparisons and Control Flow
The RV32F and RV32D extensions omit dedicated floating-point branch instructions to maintain datapath simplicity.
Instead, comparison instructions evaluate two floating-point registers and output a boolean 1 or 0 into a standard integer destination register.
Available comparisons evaluate equality (feq.s/d), strict less-than (flt.s/d), and less-than-or-equal-to (fle.s/d) conditions.
Standard integer branch instructions (introduced in the base ISA) then evaluate this integer register to execute the conditional jump.
Floating-point data frequently requires structural modification or type casting before it can be effectively evaluated or stored in other data structures.
Type Conversion, Sign Injection, and Classification
Data Conversion: The architecture provides exhaustive conversion capabilities between 32-bit signed/unsigned integers, 32-bit floating-point data, and 64-bit floating-point data.
Conversions utilize the fcvt family of instructions, explicitly defining both the source and target data types (e.g., fcvt.s.w converts a signed integer word to single-precision float).
Sign Injection: The fsgnj family of instructions copies a complete floating-point value from a source register while allowing independent manipulation of its sign bit.
fsgnj.s/d applies the exact sign bit from a second source register.
fsgnjn.s/d applies the inverted sign bit of a second source register.
fsgnjx.s/d applies the XOR result of the sign bits from both source registers.
These hardware primitives facilitate vital pseudoinstructions without adding opcodes: absolute value (fabs uses fsgnjx because 0⊕0=0 and 1⊕1=0), negation (fneg utilizes fsgnjn), and raw register moves (fmv utilizes fsgnj).
Data Classification: The fclass.s/d instruction analyzes a floating-point operand and maps it to one of 10 standardized IEEE 754-2008 states.
The instruction outputs a 10-bit one-hot mask into an integer destination register.
Classifiable states include: −∞, negative normal, negative subnormal, −0, +0, positive subnormal, positive normal, +∞, signaling NaN, and quiet NaN.