Hello, Lossless Network
TLDR: New site, built for network engineers entering AI. Deep modules + fast blog + zero vendor noise. First module drops soon.
Welcome to Lossless Network — AI networking, distilled for network engineers.
If you've ever tried to design or operate a network fabric for large-scale AI training, you've felt the gap. The standards are public. The vendor whitepapers exist. But nowhere is there a single, opinionated, technically honest walkthrough of how the pieces fit together — written by a network engineer, for network engineers — and why some choices that look right on paper fall apart at scale.
This site is my attempt to fix that.
Who this is for
You're a network engineer who:
- Builds and operates production data center fabrics
- Has been told to "support AI workloads" — which now means RoCEv2, lossless Ethernet, NCCL collectives, and topologies that look nothing like CLOS
- Wants to actually understand what's happening on the wire when 1,024 GPUs run all-reduce simultaneously
- Refuses to pretend a vendor slide deck is a design document
If that's you, you're home.
The promise
Deep when you need depth. Fast when you need speed.
- Got 30 seconds? Read the TLDR at the top of every post. That's all you need.
- Got 5 minutes? Read the blog. Sharp takes, no filler.
- Got an afternoon? Take a module. First principles to production deployment, end to end.
You decide how deep to go. The content respects your time.
What's coming
Six modules, written in order:
- RDMA Fundamentals — verbs, QPs, MRs, and why kernel-bypass exists
- RoCEv2 & Lossless Ethernet — PFC, ECN, DCQCN, and what makes RDMA work over Ethernet
- AI Fabric Architecture — rail-optimized topologies, NCCL, the all-reduce bottleneck
- Congestion Control — the actual tuning, with numbers
- Adaptive Routing — DLB, FLB, and why static ECMP kills GPU jobs
- UEC & The Future — what comes after RoCE
Plus a blog for everything that doesn't fit the module structure — field notes, debugging stories, paper reviews, and "the thing the vendor didn't tell you" posts.
What this site is not
- A vendor pitch
- A reskinned wiki dump
- AI-generated filler
Every word is written by hand, reviewed by hand, and grounded in real engineering experience.
Stay updated
The blog has an RSS feed. New modules drop one at a time. Follow along, push back when I'm wrong, and let's build the resource the AI networking field has been missing.
— Nagarjun