Fundamentals of Quantitative Design and Analysis
Source: 01 Fundamentals of Quantitative Design and Analysis.pdf
1. Technological and Architectural Evolution
- Historical Growth: Uniprocessor performance improved at roughly 50% per year from 1986 to 2003, driven by scaling and architectural optimizations (pipelining, multiple issue, caches).
- End of Dennard Scaling: Power density is no longer constant as transistors shrink. Voltage and current cannot drop further without compromising integrated circuit dependability.
- Slowing of Moore’s Law: Transistor counts no longer double every 1.5 to 2 years, decelerating the growth of devices per chip.
- Architectural Shift: General-purpose uniprocessor performance growth has slowed significantly. The industry shifted to
- multicore processors (Task-Level Parallelism), and
- Domain-Specific Architectures (DSAs) to improve energy-performance-cost under fixed power budgets.
2. Classes of Computers
- Internet of Things (IoT) / Embedded: Focus on minimizing price and energy. Performance is dictated by application-specific real-time constraints rather than peak speed.
- Personal Mobile Devices (PMDs): Driven by energy efficiency, responsiveness, and media performance. Packaging constraints and battery life strictly limit power consumption.
- Desktop Computing: Optimized for price-performance. Characterized by balanced performance for compute and graphics.
- Servers: Prioritize availability, scalability, and throughput. Cost targets focus on Total Cost of Ownership (TCO), integrating lifetime power and maintenance expenses.
- Clusters / Warehouse-Scale Computers (WSCs): Massive collections of commodity servers acting as a single entity. Designed for extreme price-performance and energy proportionality. Redundancy is managed via software to mask component failures.
3. Classes of Parallelism
- Application Parallelism:
- Data-Level Parallelism (DLP): Simultaneous operations applied to multiple data items.
- Task-Level Parallelism (TLP): Independent tasks created to execute simultaneously.
- Flynn’s Taxonomy (Hardware Parallelism):
- SISD: Traditional uniprocessors.
- SIMD: Exploits DLP. Includes vector architectures and GPUs.
- MISD: No commercial implementations.
- MIMD: Exploits TLP. Includes multicores and clusters.
4. Defining Computer Architecture
Architecture encompasses three distinct components to meet functional requirements within power, cost, and availability constraints.
- Instruction Set Architecture (ISA): The programmer-visible interface.
- Organization (Microarchitecture): High-level system design, memory interconnect, and CPU logic execution.
- Hardware: Detailed logic design and packaging technology.
5. Technology Trends
- Bandwidth vs. Latency: Across microprocessors, memory, networks, and disks, bandwidth (throughput) scales substantially faster than latency (response time).
- Transistor Scaling: Transistor density scales quadratically with a linear reduction in feature size. Transistor performance scales linearly.
- Wire Scaling: Wire signal delay scales poorly. Signal propagation delay consumes increasing fractions of the clock cycle.
6. Power and Energy Trends
- Metrics: Energy (Joules) is the correct metric for completing a fixed workload. Power (Watts) acts as a constraint (Thermal Design Power, TDP) dictating cooling and packaging limits.
- Dynamic Energy and Power: Driven by switching transistors.
- Static Power: Leakage current flowing even when transistors are inactive. Scales with the total number of devices. Leakage limits necessitate power gating.
- Dark Silicon: Transistor budgets exceed thermal dissipation limits; not all areas of a chip can be powered simultaneously.
- Energy Efficiency Techniques:
- Dynamic Voltage-Frequency Scaling (DVFS).
- Clock gating for idle modules.
- Temporary overclocking (Turbo mode) utilizing thermal margins.
- Heterogeneous cores (combining high-performance and high-efficiency cores).
- Race-to-halt: computing quickly to enter deep sleep modes.
7. Cost Trends
-
Learning Curve: Manufacturing costs decrease over time as yield improves.
-
Volume and Commoditization: High manufacturing volume amortizes development costs and increases efficiency.
-
Integrated Circuit Cost Factors:
Where is the process-complexity factor.
-
Chiplets: Breaking a large monolithic die into smaller, interconnected dies to increase yield and reduce manufacturing costs.
8. Dependability and Security
- States of Service: Systems alternate between service accomplishment and service interruption.
- Metrics:
- Mean Time To Failure (MTTF): A reliability measure of continuous service accomplishment.
- Mean Time To Repair (MTTR): Time spent in service interruption.
- Mean Time Between Failures (MTBF): .
- Failures in Time (FIT): Failures per billion hours ().
- Availability: .
- Redundancy: Essential for improving MTTF. Systems implement spatial or temporal redundancy to tolerate independent faults.
- Silent Data Errors: Faults in functional logic causing incorrect execution without halting the system, requiring hardware/software verification checks.
- Security Vulnerabilities: Microarchitectural state changes during speculative execution (e.g., Spectre) create timing side channels that leak protected information.
9. Measuring and Summarizing Performance
-
Execution Time: The single most reliable measure of computer performance. CPU time isolates execution from I/O wait times.
-
Benchmarks: Standardized suites (e.g., SPEC, TPC) prevent overfitting to trivial kernels and establish baseline configurations.
-
Summarizing Performance (SPECRatio): Comparing execution times normalized to a reference machine requires geometric means.
The ratio of geometric means directly maps to the geometric mean of performance ratios.
10. Quantitative Principles of Computer Design
-
Take Advantage of Parallelism: Exploit DLP, TLP, and Instruction-Level Parallelism (pipelining, multiple issue) at all levels of system design.
-
Principle of Locality:
- Temporal Locality: Recently accessed items will likely be accessed again soon.
- Spatial Locality: Items near recently accessed items will likely be accessed soon.
-
Focus on the Common Case: Optimize frequent operations over infrequent ones for highest impact.
-
Amdahl’s Law: Limits the speedup obtained from an enhancement based on the fraction of time the enhancement is usable.
-
Processor Performance Equation:
Requires evaluating hardware implementation (Clock cycle time), organization (CPI), and compiler technology (Instruction count).