3.1 Understanding AI Fabric Architecture
What an AI training fabric IS — the components, the four-fabric model, and what makes it different from the traditional DC network you already know.
3.2 Design Options
The four fabric design patterns — ROD, RUD, Scheduled, Multi-Planar — at a glance. A decision tree, a comparison matrix, and the multi-tenancy question.
3.3 Rail-Optimized Design (ROD)
What "rails" mean in an AI fabric, why each GPU gets its own dedicated leaf, how this changes blast radius, and pod sizing.
3.4 Switches for AI
The dominant switch vendors in AI fabrics and the network OS each runs — NVIDIA Spectrum-X, Arista, Cisco, Juniper/HPE, and white-box SONiC. Who builds the box, and what AI changes about a switch.
3.5 Switch Silicon
The merchant ASICs inside AI switches — Broadcom Tomahawk vs Jericho, NVIDIA Spectrum-4, Cisco Silicon One, Marvell Teralynx. The shallow-buffer vs deep-buffer divide that defines an AI fabric.
3.6 NICs & DPUs
The RDMA NICs and DPUs at the host edge of an AI fabric — NVIDIA ConnectX/BlueField, Broadcom Thor, Intel E810, AWS EFA, AMD Pollara. What a NIC does in an AI fabric, NIC vs DPU, and the UEC next wave.
3.7 Cluster Sizing & Cabling
Reference fabric designs from 1024 → 100K GPU scale. Switch radix math, transceiver choices (OSFP, AOC, DAC), and the cabling labelling scheme that survives day-1 install.
3.8 Master Reference
The whole chapter in one scroll. An interactive deep-dive covering Clos, fat-tree, ECMP, ECMP-with-RoCEv2, the seven ECMP failure modes, lossless mechanisms, and AI fabric design patterns — animated.