Speculating extensively provides an approach to exploiting Instruction-Level Parallelism (ILP), but it fundamentally requires the ability to disambiguate memory references.
Software-based Speculation (Compile-time)
Memory disambiguation is highly difficult to achieve at compile time for integer programs that contain pointers.
The Intel Itanium processor represented the most ambitious computer design based on software support for ILP and speculation.
This software-centric approach failed to deliver on designers’ performance expectations, particularly for general-purpose, nonscientific code.
Hardware-based Speculation (Runtime)
Employs dynamic runtime disambiguation of memory addresses to resolve dependencies.
Resolving dependencies earlier in the instruction pipeline creates subtle architectural interactions.
These interactions complicate the design and verification process and can negatively impact overall performance.
Despite these difficulties, most architectures ultimately settled on hardware-based mechanisms as ambitions for exploiting massive ILP were scaled back.
The specific mechanisms chosen to extract ILP and handle speculation directly dictate how memory requests are generated, requiring the memory hierarchy to closely align with these access patterns.
Memory System and Compiler Alignment
Memory misses are processed directly when the subsequent level in the hierarchy is an on-chip cache, such as L2 or L3.
Compiler-driven Penalty Reduction
For well-behaved scientific programs, the compiler can overlap operations to intentionally sustain multiple outstanding L2 misses.
Sustaining these multiple concurrent misses effectively cuts the overall L2 miss penalty.
Hardware Dependencies
Software-level memory optimizations rely entirely on the structural capabilities of the underlying hardware.
To successfully reduce the miss penalty, the memory system behind the cache must be explicitly designed to support the exact number of simultaneous memory accesses required to match the compiler’s goals.