RV32M: Multiply and Divide Instruction Set Architecture
Integer Multiplication
The fundamental multiplication operation computes a product from two operands: Multiplying two 32-bit numbers naturally yields a 64-bit product. To maintain hardware simplicity and avoid writing to multiple destination registers simultaneously, the architecture splits the 64-bit product retrieval across two separate instructions.
- Lower Half Extraction
mul: Computes and stores the lower 32 bits of the full 64-bit integer product.
- Upper Half Extraction
mulh: Computes the upper 32 bits when both operands are signed.mulhu: Computes the upper 32 bits when both operands are unsigned.mulhsu: Computes the upper 32 bits when one operand is signed and the other is unsigned.
The generation of precise integer products requires corresponding division mechanisms to reverse or complement the arithmetic operations.
Integer Division and Remainder
Division operations separate a dividend into a quotient and a remainder based on the divisor: The architecture provides dedicated instructions for both quotient and remainder extraction.
- Quotient Instructions
div: Computes the quotient for signed integers and places it in the destination register.divu: Computes the quotient for unsigned integers.
- Remainder Instructions
rem: Computes the remainder for signed integers.remu: Computes the remainder for unsigned integers.
- Divide-by-Zero Handling
- The architecture does not implement hardware traps for division by zero.
- Software handles zero-checks directly when required, typically by inserting a
beqz(branch if equal to zero) test on the dividend before executing the division instruction.
Handling division mechanics natively sets the foundation for software-level optimizations that bypass inherently slow hardware division steps.
Arithmetic Optimizations and Edge Cases
Hardware division is significantly slower than multiplication, prompting several standardized algorithmic optimizations.
- Constant Division
- Division by powers of 2 relies on native shift instructions; for example,
srlhandles unsigned division by . - Division by other specific constants utilizes multiplication by an approximate reciprocal, followed by correction steps applied to the upper half of the product.
- Division by powers of 2 relies on native shift instructions; for example,
- Multi-Word Operations
mulhsuoperates as a critical substep for multi-word signed multiplication.- It multiplies the most-significant word of the multiplier (containing the sign bit) with the less-significant unsigned words of the multiplicand, improving multi-word multiplication performance by approximately 15%.
- Software Overflow Checking
- Unsigned Multiplication: Overflow is absent if the result of the
mulhuinstruction is exactly zero. - Signed Multiplication: Overflow is absent if all bits in the
mulhresult match the sign bit of themulresult (0 for positive,ffffffffin hexadecimal for negative).
- Unsigned Multiplication: Overflow is absent if the result of the
The deliberate omission of dedicated hardware traps for arithmetic constraints exemplifies a minimalist architectural approach, strictly distinguishing this instruction set from historical counterparts.
Architectural Design and Comparisons
The extension eliminates specialized hardware structures to streamline the processing pipeline and maintain low core footprints.
- Register State Efficiency
- Operations write directly to standard general-purpose registers rather than requiring dedicated destination registers.
- MIPS-32 architectures necessitate special
HIandLOregisters for multiply/divide results. - Dedicated registers increase architectural state, slow down context switching between tasks, and demand extra
moveinstructions to access computed results.
- Historical Baselines
- Early ARM-32 architectures lacked hardware divide instructions entirely, making them mandatory only after 2005.
- Modularity
- The multiply and divide instruction set operates as an optional standard extension.
- A complete software stack can execute entirely without these instructions, permitting embedded chip implementations to minimize size and cost by omitting the hardware.