Transport Options

Setup first: if you haven't read Why AI Networks Are Different, start there — it covers what transport is and why AI breaks the everyday baseline.

This page is the design space. Four distinct families, each solving a different problem, all coexisting in 2026. The history of how we got here fits in three eras.

The transport landscape, 2x2 grid of families. Family 1 Classic IP (L4) — general networking, kernel-bound — TCP, UDP, QUIC, MPTCP. Family 2 RDMA traditional — kernel-bypass HPC/AI, needs lossless underlay — InfiniBand, RoCE v2, RoCE v1, iWARP. Family 3 Hyperscaler custom — 100K+ GPU scale, drops PFC, packet spraying, microsecond failover — MRC, Falcon, SRD, UET. Family 4 Scale-up — intra-server GPU, TB/s, ns latency — NVLink, UALink, SUE, ICI. — The transport landscape at a glance. Each family solves a different problem and they coexist.

50 years in three eras

Active spec / standardVendor / proprietarySuperseded

Era 1 — TCP rules everything (1973–2000)

TCP/IP was designed for unreliable WANs in the 1970s. Reliability over latency. The world adopted it; everything else (web, mail, RPC) built on top. UDP (1980) and SCTP (2000) filled niches but TCP carried the load. By 2000, the network was a kernel-bound byte-stream with software-only congestion control — fine for everything except low-latency HPC.

Era 2 — HPC and early AI demand kernel-bypass (1999–2015)

In 1999 the IBTA defined RDMA and the verbs API, and InfiniBand shipped with credit-based flow control — bytes moved without the kernel touching them. RoCE v1 (2010) brought RDMA to Ethernet by faking a lossless fabric with PFC; RoCE v2 (2014) made it routable over UDP/IP. The DCQCN paper (Microsoft Research, 2015) proved RoCE could run at hyperscale, and RoCE moved from research into Azure, Meta, Tencent, and most cloud RDMA deployments today.

Era 3 — AI at 100K+ GPUs breaks RoCEv2 (2015 → today)

At 100K-GPU scale, RoCEv2's two crutches both fail. PFC causes head-of-line blocking and the deadly PFC storm; ECMP can't load-balance the elephant flows of a synchronized collective. So each hyperscaler built their own transport that drops both crutches: AWS SRD (2018), Alibaba eRDMA (2020), Google Falcon (2023), and MRC (2024 — the OpenAI / Microsoft / NVIDIA / AMD / Broadcom / Intel collaboration). The Ultra Ethernet Consortium (2023, ~50 members without NVIDIA) shipped UEC 1.0 in 2025 as the open-standard convergence target. NVIDIA, meanwhile, ships Spectrum-X — its own vertically-integrated AI-Ethernet stack.

The network is no longer just packet-forwarding infrastructure — it is part of the distributed compute system itself.

The four families

The transport landscape today sits in four buckets. Each solves a different problem, and they coexist:

#	Family	Solves
1	Classic IP transports	General networking — internet, applications, control plane
2	RDMA transports (traditional)	Kernel-bypass for HPC and traditional AI clusters
3	AI / hyperscaler custom transports	RoCEv2 at 100K+ GPU scale (multipath, no PFC dependence)
4	Scale-up interconnects	Intra-server / intra-rack GPU-to-GPU communication

Pick the family you care about — each tab is self-contained:

1. Classic IP (L4)
2. RDMA (traditional)
3. Hyperscaler custom
4. Scale-up

You know these. Listed for completeness.

Protocol	Owner / Std	Reliable	Ordered	Multipath	Encryption	Key trait	Used by / for
TCP	IETF	Yes	Yes	No	External (TLS)	Byte-stream, AIMD CC, HoL blocking	Web, SSH, SMTP — universal
UDP	IETF	No	No	No	External (DTLS)	Connectionless, low overhead	DNS, DHCP, VoIP, gaming, QUIC base
SCTP	IETF	Yes	Per-stream	Multi-home failover	External (DTLS)	Multi-streaming + multi-homing	Telecom — SS7/SIGTRAN, Diameter, 5G N2
DCCP	IETF	No	No	No	No	Unreliable + congestion control	Mostly research / abandoned
QUIC	IETF (Google origin)	Yes	Per-stream	Connection migration	Built-in (TLS 1.3)	0/1-RTT setup, user-space	HTTP/3 — Google, Cloudflare, Meta, Apple
MPTCP	IETF	Yes	Yes	Yes (subflows)	External	TCP across multiple paths	Apple Siri/iOS, Samsung, Linux
UDP-Lite	IETF	No	No	No	External	Partial checksum	Loss-tolerant codecs

Standout: QUIC. The only L4 protocol that natively supports multipath (via connection migration), is fully user-space, and ships with TLS 1.3 built in. You'll meet it on the inference and edge path.

The transports that built HPC and early AI. All share the IBTA verbs API, all are kernel-bypass.

Protocol	Owner / Std	Substrate	Lossless required	Multipath	Encryption	Key trait	Used by / for
InfiniBand	NVIDIA / IBTA	IB fabric	Yes (credit-based FC)	RD mode only	Optional	Native RDMA, sub-μs latency	DGX SuperPOD, TOP500 HPC, Meta RSC
RoCE v1	IBTA (open)	Ethernet L2	Yes (PFC)	Limited	Optional	IB transport over Ethernet, non-routable	Same-subnet RDMA
RoCE v2	IBTA (open)	UDP/IP	Yes (PFC)	Limited (ECMP)	Optional	IB transport over UDP, routable	Azure, Meta, Tencent, ByteDance, Baidu
iWARP	IETF (open)	TCP/IP	No	Via TCP	Optional	RDMA over TCP, no PFC needed	Intel E810, Chelsio (niche)
IB Verbs modes	IBTA	—	—	—	—	RC, RD, UC, UD transport modes	RC dominant in production

Key takeaway: RDMA at scale (RoCE v2) needs a lossless underlay (PFC) and doesn't multipath naturally. Both constraints break at 100K+ GPU scale — which is why the next family exists.

Each major hyperscaler hit RoCEv2's ceiling and built their own transport. The common pattern: packet spraying for multipath, built-in encryption, out-of-order delivery with hardware reassembly, microsecond failover when a link or switch breaks.

Protocol	Owner	Substrate	Lossless?	Multipath	Encryption	Key trait	Used by / for
MRC (Multipath Reliable Conn.)	OpenAI + AMD/MS/NV/Broadcom/Intel — OCP	Ethernet/IP	No	Yes (packet spray)	Built-in	Evolution of RoCE v2; μs failover; verbs-compat	OpenAI training, Microsoft Fairwater, Oracle Abilene
Falcon	Google — OCP	Ethernet/IP	No	Yes (PLB)	Built-in (PSP/IPSec)	HW transport, multi-ULP (RDMA + NVMe)	Google Cloud, Intel E2100 IPU
SRD (Scalable Reliable Datagram)	AWS	Ethernet/IP	No	Yes (packet spray)	Built-in	Out-of-order delivery, hw-offloaded, libfabric API	AWS EFA — EC2 P5, Trn1/Trn2, HPC
UET (Ultra Ethernet Transport)	UEC consortium (open)	Ethernet/IP	No	Yes (packet spray)	Built-in	Open standard; ~75% from HPE Slingshot; libfabric 2.0	Industry target — 1M+ endpoint scale
Pony Express	Google (legacy)	Ethernet/IP	No	Limited	Optional	SW-only predecessor to Falcon; ran in Snap microkernel	Older Google datacenter (superseded)
eRDMA	Alibaba	VPC/Ethernet	No	Yes	Optional	RDMA for cloud tenants	Alibaba Cloud ECS

The pattern everywhere: drop the PFC dependency, spray packets across all available paths, hardware-offload reassembly. UET is the open-standard convergence target — expect MRC and Falcon ideas to fold in over time.

Scale-up is intra-server / intra-rack. Scale-out (everything above) is inter-server. Scale-up connects GPUs inside one logical box; scale-out connects boxes to other boxes. They are disjoint problems — TB/s vs Gbps, ns vs μs, different protocols.

Protocol	Owner	Domain	Key trait	Used by / for
NVLink / NVSwitch / NVLink Fabric	NVIDIA (proprietary)	Scale-up GPU	Up to 1.8 TB/s per GPU; sub-μs	DGX, GB200 NVL72, HGX
UALink	AMD / Broadcom / Cisco / Google / HPE / Intel / Meta / MS — open	Scale-up GPU	Open NVLink alternative; v1.0 in 2025	Future open AI servers
SUE (Scale-Up Ethernet)	Broadcom	Scale-up GPU	Simpler than UET; ≤1.6 Tbps, ~100 ns device latency	Broadcom AI silicon
ICI (Inter-Chip Interconnect)	Google	Scale-up TPU	Native TPU pod fabric	Google TPU v4 / v5p / Trillium pods
Slingshot / Portals 4	HPE (Cray)	HPC scale-out	Adaptive routing; UET 1.0 lineage (~75%)	Frontier, El Capitan, Aurora, leadership HPC
OmniPath	Cornelis Networks (ex-Intel)	HPC scale-out	InfiniBand-style fabric	Some HPC sites
RDS (Reliable Datagram Sockets)	Oracle	Cluster IPC	Reliable datagrams over IB / RoCE / TCP	Oracle RAC interconnect
TIPC	Ericsson / Linux	Cluster IPC	Topology-aware cluster messaging	Telecom clusters

Worth knowing: GB200 NVL72 puts 72 GPUs on one NVLink Switch fabric — that's one logical machine over scale-up. RDMA only takes over at the rack boundary.

Mental model — the 6-point synthesis

Classic IP transports cover the internet. TCP, UDP, QUIC. Universal but kernel-bound.
RDMA family (IB, RoCE v2, iWARP) covers traditional HPC/AI — but needs a lossless fabric (PFC) and doesn't multipath well.
Each hyperscaler built a custom multipath transport because RoCEv2 doesn't scale to 100K+ GPUs: Google → Falcon, AWS → SRD, OpenAI/Microsoft/NVIDIA/AMD → MRC, Alibaba → eRDMA.
UET is the open-standard convergence target. Expect MRC and Falcon ideas to fold into it over time.
Scale-up (NVLink / UALink / SUE / ICI) is intra-server and disjoint from scale-out transports. Different domain, different physics, different protocols.
Congestion control matters as much as the transport. Most modern AI fabrics combine packet spraying + delay-based CC + ECN/INT signals + microsecond failover.

Who built what — full reference table

Tech	Owner / Standards body	When
TCP/IP	IETF (DARPA)	1973–1980s
UDP	IETF	1980
InfiniBand spec	IBTA consortium	1999
Mellanox IB silicon	Mellanox Technologies (Israel)	1999 → acquired by NVIDIA 2019
RDMA Verbs API	IBTA / OpenFabrics Alliance	2000s
SCTP	IETF	2000
RoCE v1	IBTA	2010
QUIC	Google → IETF	2012 / RFC 9000 in 2021
MPTCP	IETF	2013
RoCE v2	IBTA	2014
DCQCN	Microsoft Research	SIGCOMM 2015
Pony Express	Google (legacy)	~2014–2023
AWS EFA / SRD	AWS	2018+
eRDMA	Alibaba	2020
Spectrum-X	NVIDIA	2023
Ultra Ethernet Consortium	AMD, Arista, Broadcom, Cisco, HPE, Intel, Meta, Microsoft, +50 others	Founded 2023, spec 1.0 in 2025
Falcon	Google + Intel (E2100 IPU)	2023
MRC	OpenAI + AMD + Microsoft + NVIDIA + Broadcom + Intel (OCP)	2024
UALink Consortium	AMD, Broadcom, Cisco, Google, HPE, Intel, Meta, Microsoft	2024 / v1.0 in 2025
SUE (Scale-Up Ethernet)	Broadcom	2024
ICI	Google	TPU v4 era (~2018)
Slingshot	HPE (Cray)	2019

Next: Congestion Control Options → — the algorithms that pair with these transports.

50 years in three eras​

Era 1 — TCP rules everything (1973–2000)​

Era 2 — HPC and early AI demand kernel-bypass (1999–2015)​

Era 3 — AI at 100K+ GPUs breaks RoCEv2 (2015 → today)​

The four families​

Mental model — the 6-point synthesis​