Skip to main content

Master Reference — Interactive Walk-Through

You've worked through pages 3.1 through 3.4. This page is a different surface area: the whole chapter in one scroll, animated, with the protocol mechanics shown in motion rather than described in prose.

It's a reference, not a re-teach. Topics overlap with the prose pages on purpose — sometimes the picture lands the concept faster than the paragraph did.

Best used as:

  • A first read if you're a strong visual learner — scroll through, then come back to the prose pages for the gritty details.
  • A review after you've finished pages 3.1–3.6 — the animation surfaces what you might've skimmed in text.
  • A whiteboard prop when you're explaining AI fabric design to a colleague — open this tab, drive through the relevant section.

What's covered (in order, with animations):

  • Why AI needs a different fabric — synchronized traffic vs the law of large numbers
  • Clos network — the structural foundation
  • Fat-tree — scaling the Clos
  • ECMP — equal-cost multi-path mechanics
  • ECMP with RoCEv2 — why the UDP source port is the whole load-balancing decision
  • The seven ECMP failure modes — animated
  • ECMP solutions — packet spraying, adaptive routing, rail-optimized topology
  • Lossless fabric — PFC, ECN, DCQCN at a glance
Loading the master guide…

Next: HPC Networking → — Phase 3 begins. RDMA, InfiniBand, RoCE v2 — what actually rides on the fabric you just designed.