RV32M: Multiply and Divide Instruction Set Architecture

Integer Multiplication

The fundamental multiplication operation computes a product from two operands: $P ro d u c t = M u lt i pl i c an d \times M u lt i pl i er$ Multiplying two 32-bit numbers naturally yields a 64-bit product. To maintain hardware simplicity and avoid writing to multiple destination registers simultaneously, the architecture splits the 64-bit product retrieval across two separate instructions.

Lower Half Extraction
- mul: Computes and stores the lower 32 bits of the full 64-bit integer product.
Upper Half Extraction
- mulh: Computes the upper 32 bits when both operands are signed.
- mulhu: Computes the upper 32 bits when both operands are unsigned.
- mulhsu: Computes the upper 32 bits when one operand is signed and the other is unsigned.

The generation of precise integer products requires corresponding division mechanisms to reverse or complement the arithmetic operations.

Integer Division and Remainder

Division operations separate a dividend into a quotient and a remainder based on the divisor: $Q u o t i e n t = (D i v i d e n d - R e main d er) \div D i v i sor$ $R e main d er = D i v i d e n d - (Q u o t i e n t \times D i v i sor)$ The architecture provides dedicated instructions for both quotient and remainder extraction.

Quotient Instructions
- div: Computes the quotient for signed integers and places it in the destination register.
- divu: Computes the quotient for unsigned integers.
Remainder Instructions
- rem: Computes the remainder for signed integers.
- remu: Computes the remainder for unsigned integers.
Divide-by-Zero Handling
- The architecture does not implement hardware traps for division by zero.
- Software handles zero-checks directly when required, typically by inserting a beqz (branch if equal to zero) test on the dividend before executing the division instruction.

Handling division mechanics natively sets the foundation for software-level optimizations that bypass inherently slow hardware division steps.

Arithmetic Optimizations and Edge Cases

Hardware division is significantly slower than multiplication, prompting several standardized algorithmic optimizations.

Constant Division
- Division by powers of 2 relies on native shift instructions; for example, srl handles unsigned division by $2^{i}$ .
- Division by other specific constants utilizes multiplication by an approximate reciprocal, followed by correction steps applied to the upper half of the product.
Multi-Word Operations
- mulhsu operates as a critical substep for multi-word signed multiplication.
- It multiplies the most-significant word of the multiplier (containing the sign bit) with the less-significant unsigned words of the multiplicand, improving multi-word multiplication performance by approximately 15%.
Software Overflow Checking
- Unsigned Multiplication: Overflow is absent if the result of the mulhu instruction is exactly zero.
- Signed Multiplication: Overflow is absent if all bits in the mulh result match the sign bit of the mul result (0 for positive, ffffffff in hexadecimal for negative).

The deliberate omission of dedicated hardware traps for arithmetic constraints exemplifies a minimalist architectural approach, strictly distinguishing this instruction set from historical counterparts.

Architectural Design and Comparisons

The extension eliminates specialized hardware structures to streamline the processing pipeline and maintain low core footprints.

Register State Efficiency
- Operations write directly to standard general-purpose registers rather than requiring dedicated destination registers.
- MIPS-32 architectures necessitate special HI and LO registers for multiply/divide results.
- Dedicated registers increase architectural state, slow down context switching between tasks, and demand extra move instructions to access computed results.
Historical Baselines
- Early ARM-32 architectures lacked hardware divide instructions entirely, making them mandatory only after 2005.
Modularity
- The multiply and divide instruction set operates as an optional standard extension.
- A complete software stack can execute entirely without these instructions, permitting embedded chip implementations to minimize size and cost by omitting the hardware.

My Knowledge Base

Explorer

04 RV32M

RV32M: Multiply and Divide Instruction Set Architecture

Integer Multiplication

Integer Division and Remainder

Arithmetic Optimizations and Edge Cases

Architectural Design and Comparisons