Show notes & related reading
In this first conversation we cover the three things every network engineer needs to internalize before they look at an AI cluster: (1) why tail latency on one link kills the whole job, (2) why "lossy" Ethernet isn't survivable, and (3) why oversubscription — the design lever you've relied on for two decades — becomes the enemy here. Aimed at network engineers who've never racked a GPU server and want to.
Mentioned on: