Fundamentals of Quantitative Design and Analysis

Source: 01 Fundamentals of Quantitative Design and Analysis.pdf

1. Technological and Architectural Evolution

  • Historical Growth: Uniprocessor performance improved at roughly 50% per year from 1986 to 2003, driven by scaling and architectural optimizations (pipelining, multiple issue, caches).
  • End of Dennard Scaling: Power density is no longer constant as transistors shrink. Voltage and current cannot drop further without compromising integrated circuit dependability.
  • Slowing of Moore’s Law: Transistor counts no longer double every 1.5 to 2 years, decelerating the growth of devices per chip.
  • Architectural Shift: General-purpose uniprocessor performance growth has slowed significantly. The industry shifted to
    • multicore processors (Task-Level Parallelism), and
    • Domain-Specific Architectures (DSAs) to improve energy-performance-cost under fixed power budgets.

2. Classes of Computers

  • Internet of Things (IoT) / Embedded: Focus on minimizing price and energy. Performance is dictated by application-specific real-time constraints rather than peak speed.
  • Personal Mobile Devices (PMDs): Driven by energy efficiency, responsiveness, and media performance. Packaging constraints and battery life strictly limit power consumption.
  • Desktop Computing: Optimized for price-performance. Characterized by balanced performance for compute and graphics.
  • Servers: Prioritize availability, scalability, and throughput. Cost targets focus on Total Cost of Ownership (TCO), integrating lifetime power and maintenance expenses.
  • Clusters / Warehouse-Scale Computers (WSCs): Massive collections of commodity servers acting as a single entity. Designed for extreme price-performance and energy proportionality. Redundancy is managed via software to mask component failures.

3. Classes of Parallelism

  • Application Parallelism:
    • Data-Level Parallelism (DLP): Simultaneous operations applied to multiple data items.
    • Task-Level Parallelism (TLP): Independent tasks created to execute simultaneously.
  • Flynn’s Taxonomy (Hardware Parallelism):
    • SISD: Traditional uniprocessors.
    • SIMD: Exploits DLP. Includes vector architectures and GPUs.
    • MISD: No commercial implementations.
    • MIMD: Exploits TLP. Includes multicores and clusters.

4. Defining Computer Architecture

Architecture encompasses three distinct components to meet functional requirements within power, cost, and availability constraints.

  • Instruction Set Architecture (ISA): The programmer-visible interface.
  • Organization (Microarchitecture): High-level system design, memory interconnect, and CPU logic execution.
  • Hardware: Detailed logic design and packaging technology.
  • Bandwidth vs. Latency: Across microprocessors, memory, networks, and disks, bandwidth (throughput) scales substantially faster than latency (response time).
  • Transistor Scaling: Transistor density scales quadratically with a linear reduction in feature size. Transistor performance scales linearly.
  • Wire Scaling: Wire signal delay scales poorly. Signal propagation delay consumes increasing fractions of the clock cycle.
  • Metrics: Energy (Joules) is the correct metric for completing a fixed workload. Power (Watts) acts as a constraint (Thermal Design Power, TDP) dictating cooling and packaging limits.
  • Dynamic Energy and Power: Driven by switching transistors.
  • Static Power: Leakage current flowing even when transistors are inactive. Scales with the total number of devices. Leakage limits necessitate power gating.
  • Dark Silicon: Transistor budgets exceed thermal dissipation limits; not all areas of a chip can be powered simultaneously.
  • Energy Efficiency Techniques:
    • Dynamic Voltage-Frequency Scaling (DVFS).
    • Clock gating for idle modules.
    • Temporary overclocking (Turbo mode) utilizing thermal margins.
    • Heterogeneous cores (combining high-performance and high-efficiency cores).
    • Race-to-halt: computing quickly to enter deep sleep modes.
  • Learning Curve: Manufacturing costs decrease over time as yield improves.

  • Volume and Commoditization: High manufacturing volume amortizes development costs and increases efficiency.

  • Integrated Circuit Cost Factors:

    Where is the process-complexity factor.

  • Chiplets: Breaking a large monolithic die into smaller, interconnected dies to increase yield and reduce manufacturing costs.

8. Dependability and Security

  • States of Service: Systems alternate between service accomplishment and service interruption.
  • Metrics:
    • Mean Time To Failure (MTTF): A reliability measure of continuous service accomplishment.
    • Mean Time To Repair (MTTR): Time spent in service interruption.
    • Mean Time Between Failures (MTBF): .
    • Failures in Time (FIT): Failures per billion hours ().
    • Availability: .
  • Redundancy: Essential for improving MTTF. Systems implement spatial or temporal redundancy to tolerate independent faults.
  • Silent Data Errors: Faults in functional logic causing incorrect execution without halting the system, requiring hardware/software verification checks.
  • Security Vulnerabilities: Microarchitectural state changes during speculative execution (e.g., Spectre) create timing side channels that leak protected information.

9. Measuring and Summarizing Performance

  • Execution Time: The single most reliable measure of computer performance. CPU time isolates execution from I/O wait times.

  • Benchmarks: Standardized suites (e.g., SPEC, TPC) prevent overfitting to trivial kernels and establish baseline configurations.

  • Summarizing Performance (SPECRatio): Comparing execution times normalized to a reference machine requires geometric means.

    The ratio of geometric means directly maps to the geometric mean of performance ratios.

10. Quantitative Principles of Computer Design

  • Take Advantage of Parallelism: Exploit DLP, TLP, and Instruction-Level Parallelism (pipelining, multiple issue) at all levels of system design.

  • Principle of Locality:

    • Temporal Locality: Recently accessed items will likely be accessed again soon.
    • Spatial Locality: Items near recently accessed items will likely be accessed soon.
  • Focus on the Common Case: Optimize frequent operations over infrequent ones for highest impact.

  • Amdahl’s Law: Limits the speedup obtained from an enhancement based on the fraction of time the enhancement is usable.

  • Processor Performance Equation:

    Requires evaluating hardware implementation (Clock cycle time), organization (CPI), and compiler technology (Instruction count).