WSC Power Distribution and Cooling

Warehouse-scale computers (WSCs) concentrate massive computational capacity, frequently exceeding 100 MW of critical power. Supplying this power and removing the resulting heat represent the majority of WSC infrastructure costs and fundamentally limit the scale of deployable IT equipment.

Power Utilization Effectiveness (PUE)

Efficiency in power distribution and cooling is quantified by Power Utilization Effectiveness (PUE), which measures the overhead required to support the actual computing hardware:

  • IT equipment power: The energy consumed by the compute, storage, and networking hardware housed within the racks.
  • Total facility power: The IT load plus all power delivery losses (voltage conversions) and cooling loads (fans, air conditioning units, chillers).
  • Ideal baseline: A theoretical perfect facility has a PUE of 1.0. Historically, conventional data centers operated at a PUE of 2.5 (1 W for IT requiring 1.5 W of overhead), whereas highly optimized modern WSCs maintain PUEs around 1.10.

Because high PUE drives up capital expenditures (CAPEX), operational expenditures (OPEX), and environmental impact, modern WSC design strictly optimizes the two primary sources of overhead: power delivery pathways and heat removal systems.


WSC Power Delivery Architecture

Power routing from the utility grid to the processor die requires a hierarchy of voltage step-downs, each introducing conversion losses.

  • Standard AC Distribution Hierarchy:
    • Utility Grid: Supplies high-voltage lines at 115 KV AC.
    • Utility Substation: Steps down to 10–50 KV AC for intra-site distribution.
    • Unit Substation: Steps down to 400–500 V AC and routes through an Uninterruptible Power Supply (UPS).
    • Power Distribution Units (PDUs): Steps down to 110–220 V AC, allocating ~100–200 KW per unit with individual 6 KW circuit breakers.
    • Server Power Supply: Converts 110–220 V AC to 12 V DC.
    • Voltage Regulator Modules (VRMs): Steps 12 V DC down to the specific operational voltage of the chips (e.g., 1.1 V DC for a CPU).
  • Uninterruptible Power Supply (UPS): Masks utility failures to maintain 99.99% availability.
    • Generators: Diesel-powered engines that require 10–15 seconds to start and synchronize.
    • Bridging Storage: Batteries, chemical arrays, or mechanical flywheels that supply instantaneous power until the generators assume the load.
    • Redundancy: Deployed in N+1 or N+2 configurations to allow maintenance and tolerate component failures.
  • Conversion Inefficiencies:
    • Centralized UPS Losses: Traditional facility-level UPS systems require a double conversion (AC-to-DC for battery connection, then DC-to-AC for distribution), resulting in 3–6% power loss.
    • Server-Level Losses: Inefficiencies in the server power supply and VRMs introduce another ~10% loss. This specific intra-server overhead is tracked via a localized metric known as Server-Level PUE (sPUE).
  • Power Delivery Optimizations:
    • Distributed UPS: Replaces centralized facility UPS arrays with localized battery modules inside every server rack. This yields 99% efficiency by removing the double-conversion penalty and increases fault tolerance by isolating battery failures.
    • Direct DC Distribution: Modifies unit substations to output 380 V DC, while rack-level power supplies step down directly to 12–48 V DC. This removes all AC-to-DC conversions at the rack level and naturally integrates with the DC nature of distributed battery arrays.

Just as minimizing voltage conversions recovers stranded energy in the power delivery path, eliminating mechanical refrigeration is the critical path to minimizing the remaining facility overhead: heat removal.


WSC Cooling Systems

All electrical energy consumed by a WSC is ultimately converted into heat, which must be systematically moved from the silicon dies to the external atmosphere.

  • Traditional Computer Room Air-Conditioning (CRAC):
    • Raised-Floor Airflow: Cold air is forced into an underfloor plenum and released through perforated tiles into isolated “cold aisles”.
    • Heat Transfer: Rack-mounted servers draw from the cold aisle, pass the air over internal heatsinks, and exhaust the heated air into “hot aisles”.
    • Chilled Water Loops: The hot air rises to CRAC intakes, where it passes over coils filled with cold water. The warmed water is pumped to external chillers and cooling towers, which use mechanical refrigeration to cool the water and release the heat outdoors.
  • Cooling Optimizations:
    • Thermal Setpoint Elevation: Raising the cold aisle ambient temperature from <20°C (68°F) to 27°C (80°F) significantly reduces the workload on external chillers.
    • Aisle Isolation: Strict physical containment of hot and cold aisles prevents thermal mixing, ensuring the CRAC units receive undiluted hot exhaust, which maximizes thermodynamic efficiency.
    • Holistic Airflow Management: Individual server fans are inherently inefficient (consuming 6–9 W each) and prone to mechanical failure. WSCs synchronize low-power server fans with massive facility-level air impellers. Fan speeds are governed by the pressure differential between hot and cold aisles rather than localized temperature readings.
    • Economization (Free Cooling): Bypasses mechanical refrigeration by utilizing the ambient environment.
      • Air-economization: Large fans pull in low-temperature outside air, filter it, and circulate it directly through the facility.
      • Water-economization: Replaces mechanical chillers with heat exchangers connected to naturally cold local water sources (e.g., lakes, rivers, or the sea).
    • Liquid and Immersion Cooling: As processor thermal design power (TDP) reaches 400 W for CPUs and 800 W for GPUs, forced-air cooling becomes physically insufficient. Advanced WSCs deploy closed-loop, direct-to-chip liquid cooling or submerge equipment entirely into dielectric coolants to drastically improve heat removal rates, enabling higher clock frequencies and denser chip geometries.