Host Networking
The question this page answers: how does an application running inside a Kubernetes pod get RDMA access to the NIC?
Short answer: SR-IOV creates virtual NICs, Multus attaches them to pods, and the GPU Operator manages the supporting drivers. Long answer below.
PF vs VF
Modern RDMA NICs (ConnectX-7, Thor, E810) expose themselves as multiple PCIe functions:
- PF (Physical Function) — the "main" NIC. One PF per physical NIC port. The host OS sees the PF and loads the RDMA driver against it.
- VF (Virtual Function) — a slice of the NIC, hardware-isolated from other VFs. Each VF has its own queue pairs, memory protection, and (often) its own IP / MAC. A modern NIC can expose 64–256 VFs.
When you "give the pod a NIC," you're really giving it a VF. The PF stays on the host.
┌────────── Physical NIC ──────────┐
│ PF (host owns this) │ ← driver loads here
│ ├── VF 0 (pod A gets this) │
│ ├── VF 1 (pod B gets this) │
│ ├── VF 2 (pod C gets this) │
│ └── ... up to 64–256 VFs │
└────────────────────────────────────┘
VFs are how multiple pods share one physical NIC without contending — each gets isolated hardware queues, isolated DMA, and the throughput each VF can sustain is bounded by the NIC.
SR-IOV
SR-IOV (Single Root I/O Virtualization) is the PCIe spec that lets a device present multiple Virtual Functions. It's been around since 2007 — what's new is using it for RDMA at scale.
The setup chain:
- BIOS — enable Intel VT-d / AMD-Vi (IOMMU). Required for any VF passthrough.
- Kernel —
intel_iommu=on(oramd_iommu=on) in the boot cmdline. - Driver — load the NIC driver with
num_vfs=Nto create N VFs per port. - k8s — install the SR-IOV Network Operator (typically from Red Hat, NVIDIA, or built into the GPU Operator). It manages VF inventory.
- CNI — the SR-IOV CNI plugin attaches a VF to a pod when scheduled.
If any of these steps is wrong, you get cryptic errors. The most common debug pattern: SR-IOV looks configured but VFs don't appear in /sys/class/net/. That's usually the kernel cmdline.
Multus
Standard k8s gives each pod one network interface (eth0). That's fine for web workloads. AI training needs:
- A "control" interface (for k8s control plane, image pulls, logs)
- One or more "data" interfaces (the RDMA NICs)
Multus is a CNI meta-plugin that lets a pod attach to multiple networks. It chains other CNI plugins (Calico for control, SR-IOV for data) and presents the pod with multiple interfaces.
A typical AI training pod:
Pod ─┬── eth0 (Calico CNI, k8s control plane)
├── net1 (SR-IOV CNI, VF on rail 0)
├── net2 (SR-IOV CNI, VF on rail 1)
├── ...
└── net8 (SR-IOV CNI, VF on rail 7)
Each netN is a VF on a different rail. With rail-optimized topology, this maps GPU-N to Rail N naturally.
The pod spec includes a k8s.v1.cni.cncf.io/networks annotation that tells Multus which NetworkAttachmentDefinitions (NADs) to attach. NADs are k8s resources that describe each network.
GPU Operator (and Network Operator)
NVIDIA's GPU Operator is a Kubernetes operator that automates the entire stack required to run GPU workloads:
- NVIDIA driver
- Container runtime hook (so containers see the GPU)
- DCGM exporter (telemetry)
- Node Feature Discovery (labels nodes with GPU info)
- Optional: MIG support, vGPU, time-slicing
The Network Operator is the sibling for the NIC side:
- Mellanox OFED driver
- RDMA shared device plugin (so pods can request RDMA resources)
- IB-K8s integration (if InfiniBand)
- SR-IOV Network Operator integration (for VF management)
You install both. Together they bootstrap a node from "bare hardware" to "ready to schedule RDMA + GPU pods" in minutes. Without them, you're managing drivers, CNI configs, and device plugins by hand — error-prone and slow.
The order that has to be right
Here's the dependency chain. Any link in the wrong order and you'll spend hours debugging:
- Hardware enabled — BIOS VT-d / AMD-Vi on, all firmware updated
- OS configured — IOMMU, hugepages, RDMA core packages installed
- GPU Operator deployed — installs NVIDIA driver
- Network Operator deployed — installs Mellanox OFED, sets up VFs
- Multus installed — meta-CNI plugin
- NetworkAttachmentDefinitions created — one NAD per rail
- Pod spec uses the right annotations — Multus reads them, schedules VFs
For first-time setups: budget a week to get this right end-to-end. For repeat setups with automation: minutes.
What you should remember
- PF is the physical NIC (host owns it). VF is a hardware-isolated slice (pod gets it).
- SR-IOV is the PCIe mechanism. Requires BIOS + kernel + driver + Operator + CNI all configured.
- Multus is what lets a pod have multiple network interfaces — needed because RDMA traffic goes through a different NIC than k8s control plane.
- GPU Operator + Network Operator automate the driver / VF / plugin stack. Don't try to do this by hand at scale.
- The setup chain has many steps. Most production debugging is "which step was misconfigured?"
Next: What This Curriculum Picks → — bare metal as the teaching baseline, k8s + Multus + SR-IOV as the production variant.