Transport Options
Setup first: if you haven't read Why AI Networks Are Different, start there — it covers what transport is and why AI breaks the everyday baseline.
This page is the design space. Four distinct families, each solving a different problem, all coexisting in 2026. The history of how we got here fits in three eras.
50 years in three eras
Era 1 — TCP rules everything (1973–2000)
TCP/IP was designed for unreliable WANs in the 1970s. Reliability over latency. The world adopted it; everything else (web, mail, RPC) built on top. UDP (1980) and SCTP (2000) filled niches but TCP carried the load. By 2000, the network was a kernel-bound byte-stream with software-only congestion control — fine for everything except low-latency HPC.
Era 2 — HPC and early AI demand kernel-bypass (1999–2015)
In 1999 the IBTA defined RDMA and the verbs API, and InfiniBand shipped with credit-based flow control — bytes moved without the kernel touching them. RoCE v1 (2010) brought RDMA to Ethernet by faking a lossless fabric with PFC; RoCE v2 (2014) made it routable over UDP/IP. The DCQCN paper (Microsoft Research, 2015) proved RoCE could run at hyperscale, and RoCE moved from research into Azure, Meta, Tencent, and most cloud RDMA deployments today.
Era 3 — AI at 100K+ GPUs breaks RoCEv2 (2015 → today)
At 100K-GPU scale, RoCEv2's two crutches both fail. PFC causes head-of-line blocking and the deadly PFC storm; ECMP can't load-balance the elephant flows of a synchronized collective. So each hyperscaler built their own transport that drops both crutches: AWS SRD (2018), Alibaba eRDMA (2020), Google Falcon (2023), and MRC (2024 — the OpenAI / Microsoft / NVIDIA / AMD / Broadcom / Intel collaboration). The Ultra Ethernet Consortium (2023, ~50 members without NVIDIA) shipped UEC 1.0 in 2025 as the open-standard convergence target. NVIDIA, meanwhile, ships Spectrum-X — its own vertically-integrated AI-Ethernet stack.
The network is no longer just packet-forwarding infrastructure — it is part of the distributed compute system itself.
The four families
The transport landscape today sits in four buckets. Each solves a different problem, and they coexist:
| # | Family | Solves |
|---|---|---|
| 1 | Classic IP transports | General networking — internet, applications, control plane |
| 2 | RDMA transports (traditional) | Kernel-bypass for HPC and traditional AI clusters |
| 3 | AI / hyperscaler custom transports | RoCEv2 at 100K+ GPU scale (multipath, no PFC dependence) |
| 4 | Scale-up interconnects | Intra-server / intra-rack GPU-to-GPU communication |
Pick the family you care about — each tab is self-contained:
- 1. Classic IP (L4)
- 2. RDMA (traditional)
- 3. Hyperscaler custom
- 4. Scale-up
You know these. Listed for completeness.
| Protocol | Owner / Std | Reliable | Ordered | Multipath | Encryption | Key trait | Used by / for |
|---|---|---|---|---|---|---|---|
| TCP | IETF | Yes | Yes | No | External (TLS) | Byte-stream, AIMD CC, HoL blocking | Web, SSH, SMTP — universal |
| UDP | IETF | No | No | No | External (DTLS) | Connectionless, low overhead | DNS, DHCP, VoIP, gaming, QUIC base |
| SCTP | IETF | Yes | Per-stream | Multi-home failover | External (DTLS) | Multi-streaming + multi-homing | Telecom — SS7/SIGTRAN, Diameter, 5G N2 |
| DCCP | IETF | No | No | No | No | Unreliable + congestion control | Mostly research / abandoned |
| QUIC | IETF (Google origin) | Yes | Per-stream | Connection migration | Built-in (TLS 1.3) | 0/1-RTT setup, user-space | HTTP/3 — Google, Cloudflare, Meta, Apple |
| MPTCP | IETF | Yes | Yes | Yes (subflows) | External | TCP across multiple paths | Apple Siri/iOS, Samsung, Linux |
| UDP-Lite | IETF | No | No | No | External | Partial checksum | Loss-tolerant codecs |
Standout: QUIC. The only L4 protocol that natively supports multipath (via connection migration), is fully user-space, and ships with TLS 1.3 built in. You'll meet it on the inference and edge path.
The transports that built HPC and early AI. All share the IBTA verbs API, all are kernel-bypass.
| Protocol | Owner / Std | Substrate | Lossless required | Multipath | Encryption | Key trait | Used by / for |
|---|---|---|---|---|---|---|---|
| InfiniBand | NVIDIA / IBTA | IB fabric | Yes (credit-based FC) | RD mode only | Optional | Native RDMA, sub-μs latency | DGX SuperPOD, TOP500 HPC, Meta RSC |
| RoCE v1 | IBTA (open) | Ethernet L2 | Yes (PFC) | Limited | Optional | IB transport over Ethernet, non-routable | Same-subnet RDMA |
| RoCE v2 | IBTA (open) | UDP/IP | Yes (PFC) | Limited (ECMP) | Optional | IB transport over UDP, routable | Azure, Meta, Tencent, ByteDance, Baidu |
| iWARP | IETF (open) | TCP/IP | No | Via TCP | Optional | RDMA over TCP, no PFC needed | Intel E810, Chelsio (niche) |
| IB Verbs modes | IBTA | — | — | — | — | RC, RD, UC, UD transport modes | RC dominant in production |
Key takeaway: RDMA at scale (RoCE v2) needs a lossless underlay (PFC) and doesn't multipath naturally. Both constraints break at 100K+ GPU scale — which is why the next family exists.
Each major hyperscaler hit RoCEv2's ceiling and built their own transport. The common pattern: packet spraying for multipath, built-in encryption, out-of-order delivery with hardware reassembly, microsecond failover when a link or switch breaks.
| Protocol | Owner | Substrate | Lossless? | Multipath | Encryption | Key trait | Used by / for |
|---|---|---|---|---|---|---|---|
| MRC (Multipath Reliable Conn.) | OpenAI + AMD/MS/NV/Broadcom/Intel — OCP | Ethernet/IP | No | Yes (packet spray) | Built-in | Evolution of RoCE v2; μs failover; verbs-compat | OpenAI training, Microsoft Fairwater, Oracle Abilene |
| Falcon | Google — OCP | Ethernet/IP | No | Yes (PLB) | Built-in (PSP/IPSec) | HW transport, multi-ULP (RDMA + NVMe) | Google Cloud, Intel E2100 IPU |
| SRD (Scalable Reliable Datagram) | AWS | Ethernet/IP | No | Yes (packet spray) | Built-in | Out-of-order delivery, hw-offloaded, libfabric API | AWS EFA — EC2 P5, Trn1/Trn2, HPC |
| UET (Ultra Ethernet Transport) | UEC consortium (open) | Ethernet/IP | No | Yes (packet spray) | Built-in | Open standard; ~75% from HPE Slingshot; libfabric 2.0 | Industry target — 1M+ endpoint scale |
| Pony Express | Google (legacy) | Ethernet/IP | No | Limited | Optional | SW-only predecessor to Falcon; ran in Snap microkernel | Older Google datacenter (superseded) |
| eRDMA | Alibaba | VPC/Ethernet | No | Yes | Optional | RDMA for cloud tenants | Alibaba Cloud ECS |
The pattern everywhere: drop the PFC dependency, spray packets across all available paths, hardware-offload reassembly. UET is the open-standard convergence target — expect MRC and Falcon ideas to fold in over time.
Scale-up is intra-server / intra-rack. Scale-out (everything above) is inter-server. Scale-up connects GPUs inside one logical box; scale-out connects boxes to other boxes. They are disjoint problems — TB/s vs Gbps, ns vs μs, different protocols.
| Protocol | Owner | Domain | Key trait | Used by / for |
|---|---|---|---|---|
| NVLink / NVSwitch / NVLink Fabric | NVIDIA (proprietary) | Scale-up GPU | Up to 1.8 TB/s per GPU; sub-μs | DGX, GB200 NVL72, HGX |
| UALink | AMD / Broadcom / Cisco / Google / HPE / Intel / Meta / MS — open | Scale-up GPU | Open NVLink alternative; v1.0 in 2025 | Future open AI servers |
| SUE (Scale-Up Ethernet) | Broadcom | Scale-up GPU | Simpler than UET; ≤1.6 Tbps, ~100 ns device latency | Broadcom AI silicon |
| ICI (Inter-Chip Interconnect) | Scale-up TPU | Native TPU pod fabric | Google TPU v4 / v5p / Trillium pods | |
| Slingshot / Portals 4 | HPE (Cray) | HPC scale-out | Adaptive routing; UET 1.0 lineage (~75%) | Frontier, El Capitan, Aurora, leadership HPC |
| OmniPath | Cornelis Networks (ex-Intel) | HPC scale-out | InfiniBand-style fabric | Some HPC sites |
| RDS (Reliable Datagram Sockets) | Oracle | Cluster IPC | Reliable datagrams over IB / RoCE / TCP | Oracle RAC interconnect |
| TIPC | Ericsson / Linux | Cluster IPC | Topology-aware cluster messaging | Telecom clusters |
Worth knowing: GB200 NVL72 puts 72 GPUs on one NVLink Switch fabric — that's one logical machine over scale-up. RDMA only takes over at the rack boundary.
Mental model — the 6-point synthesis
- Classic IP transports cover the internet. TCP, UDP, QUIC. Universal but kernel-bound.
- RDMA family (IB, RoCE v2, iWARP) covers traditional HPC/AI — but needs a lossless fabric (PFC) and doesn't multipath well.
- Each hyperscaler built a custom multipath transport because RoCEv2 doesn't scale to 100K+ GPUs: Google → Falcon, AWS → SRD, OpenAI/Microsoft/NVIDIA/AMD → MRC, Alibaba → eRDMA.
- UET is the open-standard convergence target. Expect MRC and Falcon ideas to fold into it over time.
- Scale-up (NVLink / UALink / SUE / ICI) is intra-server and disjoint from scale-out transports. Different domain, different physics, different protocols.
- Congestion control matters as much as the transport. Most modern AI fabrics combine packet spraying + delay-based CC + ECN/INT signals + microsecond failover.
Who built what — full reference table
| Tech | Owner / Standards body | When |
|---|---|---|
| TCP/IP | IETF (DARPA) | 1973–1980s |
| UDP | IETF | 1980 |
| InfiniBand spec | IBTA consortium | 1999 |
| Mellanox IB silicon | Mellanox Technologies (Israel) | 1999 → acquired by NVIDIA 2019 |
| RDMA Verbs API | IBTA / OpenFabrics Alliance | 2000s |
| SCTP | IETF | 2000 |
| RoCE v1 | IBTA | 2010 |
| QUIC | Google → IETF | 2012 / RFC 9000 in 2021 |
| MPTCP | IETF | 2013 |
| RoCE v2 | IBTA | 2014 |
| DCQCN | Microsoft Research | SIGCOMM 2015 |
| Pony Express | Google (legacy) | ~2014–2023 |
| AWS EFA / SRD | AWS | 2018+ |
| eRDMA | Alibaba | 2020 |
| Spectrum-X | NVIDIA | 2023 |
| Ultra Ethernet Consortium | AMD, Arista, Broadcom, Cisco, HPE, Intel, Meta, Microsoft, +50 others | Founded 2023, spec 1.0 in 2025 |
| Falcon | Google + Intel (E2100 IPU) | 2023 |
| MRC | OpenAI + AMD + Microsoft + NVIDIA + Broadcom + Intel (OCP) | 2024 |
| UALink Consortium | AMD, Broadcom, Cisco, Google, HPE, Intel, Meta, Microsoft | 2024 / v1.0 in 2025 |
| SUE (Scale-Up Ethernet) | Broadcom | 2024 |
| ICI | TPU v4 era (~2018) | |
| Slingshot | HPE (Cray) | 2019 |
Next: Congestion Control Options → — the algorithms that pair with these transports.