AWS Cloud

AWS launched in 2006 as the first IaaS cloud provider and, by early 2023, was the largest public cloud provider, offering more than 200 services with an $80$ billion run-rate revenue. It operated 30 geographic regions, 96 availability zones, and more than 410 points of presence.

Availability Zones: Each availability zone consists of one or more discrete WSCs with redundant power, networking, and connectivity in separate facilities.

Custom Silicon

Motivations for Custom Silicon Development:
- Elimination of Virtualization Overheads: Removing hypervisor loads from host CPUs prevents performance jitter, reduces energy consumption, and provisions 100% of compute cycles to client applications.
- Deployment Velocity: In-house hardware-software co-design compresses the deployment lifecycle of new technologies from months to weeks, enabling hitless upgrades with zero customer downtime.
- Availability and Reliability: Cloud infrastructure functions like a critical utility, so custom hardware must support high availability and continuous maintenance.
- Edge Expansion: Custom hardware allows core cloud services to be packaged into standalone units (e.g., Outpost racks) for secure, on-premise, or edge deployments.
Economic and Engineering Advantages:
- Vertical Specialization: Custom Application-Specific Integrated Circuits (ASICs) target exact warehouse-scale computer (WSC) bottlenecks, bypassing the design compromises inherent in commodity chips.
- Parallel Hardware-Software Development: AWS can develop software before a new chip is deployed, so the system can benefit from the chip immediately.
- Capital Efficiency: Developing in-house silicon circumvents the 20% to 100% profit margins typically charged by commodity microprocessor vendors.
- Holistic Security: Controlling the silicon design minimizes the attack surface through a custom hardware root of trust and formally verified firmware.
AWS Custom Chips: By early 2023, AWS had deployed Nitro chips, Nitro Security chips, Graviton CPUs, and ML DSAs such as Trainium and Inferentia. AWS acquired Annapurna Labs in 2015 to build this cloud-optimized silicon capability.

The Nitro System

Traditional hypervisors mediate all I/O operations, consuming host CPU cores and introducing latency variation under heavy network or storage load. For VPC networking, the hypervisor handles encapsulation, security rules, rate limiting, and routing.

Nitro Hypervisor: AWS replaced its Xen-based hypervisor with the minimal Nitro hypervisor in 2017. This KVM-based layer configures CPU virtualization features (VT-x, VT-d) only during VM launch or resize events.
- With posted interrupts, the hypervisor stays out of the normal I/O path. VM and bare-metal instances can perform within about 1% of each other on compute-intensive HPC benchmarks.
Nitro Cards: A family of dedicated, Arm-based PCIe accelerators that completely offload networking, storage, and management tasks from the host CPU.
- Elastic Network Adapter (ENA): A PCIe NIC using SR-IOV to give VM guests direct access to ENA virtual functions. It implements VPC data-plane tasks including encapsulation/decapsulation, security rules, rate limiting, routing, time synchronization, and traffic encryption.
- Elastic Fabric Adapter (EFA): An OS-bypass NIC for HPC and ML workloads. It uses AWS Scalable Reliable Datagram (SRD) to provide reliable datagrams over multipath WSC fabrics and supports frameworks such as OFI Libfabric and NVIDIA NCCL.
- Elastic Block Store (EBS) Controller: Exposes NVMe virtual functions for remote block storage, supporting dynamic volume attach/detach, resizing, migration, zero-copy transfers, integrity checks, barriers, snapshots, and encrypted storage volumes.
- Instance Storage Controller: Manages directly attached local Flash and HDDs, applying transparent encryption that destroys keys when a VM terminates. It also provides limiters, integrity checks, snapshots, drive monitoring, and Nitro-SSD FTL logic for Flash.
- Nitro Controller: Acts as the primary interface to the EC2 control plane, coordinating all local Nitro cards, allocating cores/memory, and managing the VM lifecycle.
Nitro Evolution: AWS has implemented five generations of Nitro-card chips. All include Arm cores so functionality can evolve, and different SKUs target EFA, EBS, all-in-one mainstream servers, or simple NIC-only roles.
Hitless Upgrades and Instance Variety: Nitro cards support firmware updates without stopping host VMs. By moving virtualization into Nitro cards, AWS can add new CPUs, DSAs, storage devices, and operating systems with far less hypervisor porting work.

The launch process for VM instances on an AWS Nitro system. The Nitro host includes one or more CPUs with directly attached memory. All local storage devices, disk or Flash, are accessed through a Nitro Instance Controller card. Remote storage is accessed through a Nitro EBS card. Networking is provided by Nitro ENA and EFA cards. The OS running on the VM instance includes device drivers that access these devices over conventional PCIe and NVMe interfaces.

Nitro Security

Hardware Interface Monitoring: The Nitro Security chip resides on the server motherboard, actively monitoring localized non-volatile storage buses (SPI and I2C) to detect anomalous behaviors.
Firmware Write Protection: The security chip structurally prohibits host CPU software from modifying any system firmware; all updates route exclusively through the authenticated Nitro Controller.
Secure Boot Sequencing:
- During initialization, the Security chip holds the host CPUs and Baseboard Management Controller (BMC) in reset.
- The Nitro Controller utilizes a tamper-resistant Trusted Platform Module (TPM) to cryptographically verify the integrity of its boot ROM against known-good signatures.
- Only after the Nitro Controller verifies the firmware chain are the host processors permitted to execute. If invalid firmware is detected, the server is removed from service.
Remote Attestation: Nitro can attest approved firmware and software images before servers outside AWS facilities, such as Outpost racks and edge deployments, are allowed to run customer workloads.
Ubiquitous Encryption: The architecture enforces transparent encryption across physical interconnects, VPC network traffic, local storage, persistent volumes, main memory, and management APIs, limiting operator access to plaintext user data.

Graviton Processor

Design Philosophy: The Graviton Arm-based CPU family is engineered for EC2 price-performance and energy-efficient servers. Graviton1, launched in 2018, showed that Arm could support IaaS workloads at large cloud scale.
Generational Scaling: Graviton2 roughly doubled per-core performance over Graviton1. Graviton3 improved integer performance by about 30% and floating-point performance by about 60%, while memory bandwidth rose from roughly 50 GB/s to 300 GB/s per chip across generations.

Feature	Graviton 1	Graviton 2	Graviton 3(E)
Year announced	2018	2019	2021
Process technology	16 nm	7 nm	7 nm
Transistor count	5 B	30 B	55 B
Instruction set	ARMv8.0a	ARMv8.2a + fp16, rcpc, dotprod, crypto	ARMv8.4 + sve, rng, bf16, int8, crypto
Clock frequency	2.3 GHz	2.5 GHz	2.6 GHz
Cores	16	64	64
Core type	Cortex-A72	Neoverse-N1	Neoverse-V1
Fetch/decode/issue width	3 / 3 / 5	4-8 / 4 / 8	8 / 5-8 / 15
Functional units (Int, FP/SIMD, LD/ST, BR)	3 / 2 / 2 / 1	3 / 2 / 2 / 1	4 / 4 / 5 / 2
SIMD FUs	2x 128b (Neon)	2x 128b (Neon)	4x 128b (Neon), 2x 256 (SVE)
ROB size	128	128	256
L1I/L1D/L2 per core	48 KB / 32 KB / —	64 KB / 64 KB / 1 MB	64 KB / 64 KB / 1 MB
LLC size	8 MB	32 MB	32 MB
DRAM type	DDR4-2633	DDR4-3200	DDR5-4800
DRAM controllers	2	8	8
DRAM bandwidth	40 GB/s	200 GB/s	300 GB/s
DRAM encryption	No	Yes	Yes
PCIe type	Gen4	Gen4	Gen5
PCIe lanes	32	48	64

Graviton3 Physical Implementation: Graviton3 uses a 7-chiplet package containing 64 cores.
- The topology includes a central compute mesh providing 2 TB/s of bisection bandwidth, flanked by specialized chiplets for memory controllers, PCIe, and I/O offloading.
Cloud-Native Core Adjustments:
- Simultaneous Multithreading (SMT) Elimination: The cores omit SMT to prevent cross-thread interference, dedicating entire L1 caches, branch predictors, and Translation Lookaside Buffers (TLBs) to a single thread.
- Single-Socket Restriction: Hardware is artificially constrained to single-socket configurations, eradicating Non-Uniform Memory Access (NUMA) software complexities while significantly lowering thermal and power envelopes.
System-Level Density: The server chassis amortizes base infrastructure costs by packaging three independent 1-socket Graviton3 servers that share a unified power supply, BMC, and Nitro card array.
Memory and Security: DDR5-4800 provides a 50% memory bandwidth increase, sustaining many-core workloads and allowing a smaller LLC. Graviton3 also implements always-on main-memory encryption, Arm pointer authentication, a true random number generator, and posted interrupts for direct VM interrupt delivery.

Cloud Efficiency

Load Percentile Optimization: Instead of designing for theoretical peak performance, WSC silicon is modeled to achieve peak energy proportionality at the 50th and 90th percentiles of operational load.
Workload Heterogeneity: Providing architectural diversity (x86 vs. Arm) enables precise hardware matching; specific microarchitectures out-perform others by over 25% depending on whether the workload is integer-heavy or floating-point intensive.
Continued Heterogeneity: AWS still deploys x86 CPUs and Apple CPUs because customers have diverse compatibility and performance requirements.
Economic and Power Yields: For scale-out workloads, the Graviton3 microarchitecture delivers between a $2 \times$ to $3 \times$ improvement in both performance-per-watt and performance-per-cost when benchmarked against contemporary, general-purpose x86 cloud instances.

My Knowledge Base

Explorer

7 AWS Cloud