07 Custom Silicon in AWS Cloud

Custom Silicon Strategic Objectives

Cloud Infrastructure Scale and Topology: AWS operates a global infrastructure encompassing 30 geographic regions, 96 availability zones, and over 410 points of presence.
Motivations for Custom Silicon Development:
- Elimination of Virtualization Overheads: Removing hypervisor loads from host CPUs prevents performance jitter, reduces energy consumption, and provisions 100% of compute cycles to client applications.
- Deployment Velocity: In-house hardware-software co-design compresses the deployment lifecycle of new technologies from months to weeks, enabling hitless upgrades with zero customer downtime.
- Edge Expansion: Custom hardware allows core cloud services to be packaged into standalone units (e.g., Outpost racks) for secure, on-premise, or edge deployments.
Economic and Engineering Advantages:
- Vertical Specialization: Custom Application-Specific Integrated Circuits (ASICs) target exact warehouse-scale computer (WSC) bottlenecks, bypassing the design compromises inherent in commodity chips.
- Capital Efficiency: Developing in-house silicon circumvents the 20% to 100% profit margins typically charged by commodity microprocessor vendors.
- Holistic Security: Controlling the silicon design minimizes the attack surface through a custom hardware root of trust and formally verified firmware.

The operational imperative to bypass commodity silicon limitations directly motivated the architectural decoupling of hypervisor functions from the main compute processors.

The Nitro System Architecture

Hypervisor Decomposition: Traditional hypervisors mediate all I/O operations, consuming critical host CPU cores and introducing significant latency variations under heavy network or storage loads.
Nitro Hypervisor: A minimalist KVM-based software layer that configures CPU virtualization features (VT-x, VT-d) only during Virtual Machine (VM) launch or resize events.
- It relies on posted interrupts to remain entirely outside the execution path during standard I/O operations, yielding performance mathematically identical to bare-metal servers.
Nitro Cards: A family of dedicated, Arm-based PCIe accelerators that completely offload networking, storage, and management tasks from the host CPU.
- Elastic Network Adapter (ENA): A PCIe Network Interface Card (NIC) utilizing Single-Root I/O Virtualization (SR-IOV) to manage Virtual Private Cloud (VPC) data planes, including encapsulation, rate limiting, and transparent encryption.
- Elastic Fabric Adapter (EFA): An OS-bypass NIC engineered for HPC and machine learning, utilizing the AWS Scalable Reliable Datagram (SRD) protocol to enable zero-copy, multipath routing across the WSC fabric.
- Elastic Block Store (EBS) Controller: Exposes NVMe virtual functions for remote block storage, handling network-based zero-copy transfers, snapshots, and storage volume encryption.
- Instance Storage Controller: Manages directly attached local Flash and Hard Disk Drives (HDDs), applying transparent encryption that permanently destroys cryptographic keys upon VM termination.
- Nitro Controller: Acts as the primary interface to the EC2 control plane, coordinating all local Nitro cards, allocating cores/memory, and managing the VM lifecycle.

Offloading virtualization to dedicated PCIe interfaces necessitates an isolated, hardware-enforced mechanism to manage physical device integrity and remote attestation.

Nitro Security and Hardware Root of Trust

Hardware Interface Monitoring: The Nitro Security chip resides on the server motherboard, actively monitoring localized non-volatile storage buses (SPI and I2C) to detect anomalous behaviors.
Firmware Write Protection: The security chip structurally prohibits host CPU software from modifying any system firmware; all updates route exclusively through the authenticated Nitro Controller.
Secure Boot Sequencing:
- During initialization, the Security chip holds the host CPUs and Baseboard Management Controller (BMC) in reset.
- The Nitro Controller utilizes a tamper-resistant Trusted Platform Module (TPM) to cryptographically verify the integrity of its boot ROM against known-good signatures.
- Only upon successful verification of the complete firmware chain are the host processors permitted to execute.
Ubiquitous Encryption: The architecture enforces transparent encryption across all physical layer interconnects, VPC network traffic, local/remote storage volumes, and management APIs, ensuring that operator access to plaintext user data is physically impossible.

With I/O virtualization and security fully encapsulated by the Nitro subsystem, the primary CPU architecture could be strictly optimized for multi-tenant, cloud-native workload execution.

Graviton Processor Microarchitecture

Design Philosophy: The Graviton Arm-based CPU family is engineered specifically for cloud cost-performance and high energy proportionality.
Graviton3 Physical Implementation: Fabricated on a 5 nm process, the processor utilizes a 7-chiplet package containing 64 cores.
- The topology includes a central compute mesh providing 2 TB/s of bisection bandwidth, flanked by specialized chiplets for memory controllers, PCIe, and I/O offloading.
Cloud-Native Core Adjustments:
- Simultaneous Multithreading (SMT) Elimination: The cores omit SMT to prevent cross-thread interference, dedicating entire L1 caches, branch predictors, and Translation Lookaside Buffers (TLBs) to a single thread.
- Single-Socket Restriction: Hardware is artificially constrained to single-socket configurations, eradicating Non-Uniform Memory Access (NUMA) software complexities while significantly lowering thermal and power envelopes.
System-Level Density: The server chassis amortizes base infrastructure costs by packaging three independent 1-socket Graviton3 servers that share a unified power supply, BMC, and Nitro card array.
Memory Subsystem: Integrates DDR5-4800 memory to yield a 50% bandwidth improvement, which sustains load performance and allows for a proportionally smaller Last-Level Cache (LLC).

The structural simplification of the Graviton processor establishes the foundation for realizing maximum energy proportionality and cost-performance across the data center fleet.

Cloud Efficiency and Performance Scaling

Load Percentile Optimization: Instead of designing for theoretical peak performance, WSC silicon is modeled to achieve peak energy proportionality at the 50th and 90th percentiles of operational load.
Workload Heterogeneity: Providing architectural diversity (x86 vs. Arm) enables precise hardware matching; specific microarchitectures out-perform others by over 25% depending on whether the workload is integer-heavy or floating-point intensive.
Economic and Power Yields: For scale-out workloads, the Graviton3 microarchitecture delivers between a $2 \times$ to $3 \times$ improvement in both performance-per-watt and performance-per-cost when benchmarked against contemporary, general-purpose x86 cloud instances.

My Knowledge Base

Explorer

07 Custom Silicon in AWS Cloud