Fundamentals Analysis

Quantitative analysis

Uniprocessor performance improved at roughly 50% per year from 1986 to 2003, driven by scaling and architectural optimizations such as pipelining, multiple issue, and caches.

Modern design now works under stricter physical limits:

  • End of Dennard scaling: power density no longer stays constant as transistors shrink.
  • Slowing of Moore’s law: transistor-count growth no longer doubles at the old pace.

Quantitative analysis is where design choices are measured rather than merely described.

  • Bandwidth has improved much faster than latency across processors, memory, networks, and storage.
  • Transistor density scales faster than many communication costs, changing where bottlenecks appear.
  • Wire delay scales poorly and consumes an increasing fraction of a clock cycle.

Cost, power, and dependability metrics

  • Energy is the correct metric for completing a fixed workload; power is the operating constraint.

  • Dynamic energy and power scale with capacitance, voltage, and switching activity.

  • Static power comes from leakage current even when devices are not actively switching.

  • Yield, die area, wafer cost, and production volume shape implementation cost.

  • Availability, MTTF, MTTR, and MTBF quantify dependability.

  • Silent data errors and speculative side channels show that correctness and security must also be analyzed quantitatively.

  • Systems alternate between service accomplishment and service interruption.

  • FIT measures failures per billion hours.

  • Redundancy can be spatial or temporal.

The classic power relations are:

The power wall limited further clock-rate scaling, which pushed the industry toward multicore designs and more specialized hardware.

Important power-management techniques include:

  • dynamic voltage-frequency scaling
  • clock gating
  • turbo or opportunistic boosting
  • heterogeneous cores
  • race-to-halt execution

Dark silicon means not all transistors on a chip can be powered simultaneously within thermal limits.

Cost also depends on learning curves, production volume, die size, and yield. Smaller dies generally improve yield, and chiplets trade packaging complexity for better yield and modular cost scaling.

Architectural consequences

  • Fixed power budgets shifted performance scaling away from pure clock-rate increases.
  • Multicore processors and domain-specific architectures became more effective ways to improve energy-performance-cost.

Performance model

  • Execution time is the most reliable performance metric.
  • CPU time isolates processor execution from I/O wait time.
  • Performance is inversely proportional to execution time:
  • Relative performance between two machines can be expressed as:
  • The processor performance equation is
  • CPI is the average number of cycles per instruction for a given program on a given implementation.
  • The three terms depend on algorithm choice, compiler and language choices, the instruction set, and the hardware implementation.
  • Response time measures total elapsed completion time.
  • Throughput measures total work completed per unit time.
  • Clock rate and clock period provide the timing basis for hardware events.

Benchmarking

  • Benchmarks such as SPEC and TPC are used to compare systems with representative workloads.
  • Comparing normalized execution times across many programs requires the geometric mean:
  • SPEC uses normalized execution ratios to compare machines across benchmark suites.
  • MIPS is a misleading metric because it depends on the instruction set and workload mix and can misrepresent true performance.

Quantitative design principles

  • Temporal locality means recently used items are likely to be used again soon.
  • Spatial locality means nearby items are likely to be used soon.
  • These locality patterns are what make memory hierarchies effective.
  • Parallelism improves performance only when the workload and overheads support it.

Amdahl’s law

Amdahl’s law limits the total speedup from any improvement:

The common-case principle follows directly from this: optimizing a rare part of execution has limited system-level impact.