Host Networking

The question this page answers: how does an application running inside a Kubernetes pod get RDMA access to the NIC?

Short answer: SR-IOV creates virtual NICs, Multus attaches them to pods, and the GPU Operator manages the supporting drivers. Long answer below.

PF vs VF

Modern RDMA NICs (ConnectX-7, Thor, E810) expose themselves as multiple PCIe functions:

PF (Physical Function) — the "main" NIC. One PF per physical NIC port. The host OS sees the PF and loads the RDMA driver against it.
VF (Virtual Function) — a slice of the NIC, hardware-isolated from other VFs. Each VF has its own queue pairs, memory protection, and (often) its own IP / MAC. A modern NIC can expose 64–256 VFs.

When you "give the pod a NIC," you're really giving it a VF. The PF stays on the host.

   ┌────────── Physical NIC ──────────┐
   │  PF (host owns this)             │  ← driver loads here
   │  ├── VF 0 (pod A gets this)      │
   │  ├── VF 1 (pod B gets this)      │
   │  ├── VF 2 (pod C gets this)      │
   │  └── ... up to 64–256 VFs        │
   └────────────────────────────────────┘

VFs are how multiple pods share one physical NIC without contending — each gets isolated hardware queues, isolated DMA, and the throughput each VF can sustain is bounded by the NIC.

SR-IOV

SR-IOV (Single Root I/O Virtualization) is the PCIe spec that lets a device present multiple Virtual Functions. It's been around since 2007 — what's new is using it for RDMA at scale.

The setup chain:

BIOS — enable Intel VT-d / AMD-Vi (IOMMU). Required for any VF passthrough.
Kernel — intel_iommu=on (or amd_iommu=on) in the boot cmdline.
Driver — load the NIC driver with num_vfs=N to create N VFs per port.
k8s — install the SR-IOV Network Operator (typically from Red Hat, NVIDIA, or built into the GPU Operator). It manages VF inventory.
CNI — the SR-IOV CNI plugin attaches a VF to a pod when scheduled.

If any of these steps is wrong, you get cryptic errors. The most common debug pattern: SR-IOV looks configured but VFs don't appear in /sys/class/net/. That's usually the kernel cmdline.

Multus

Standard k8s gives each pod one network interface (eth0). That's fine for web workloads. AI training needs:

A "control" interface (for k8s control plane, image pulls, logs)
One or more "data" interfaces (the RDMA NICs)

Multus is a CNI meta-plugin that lets a pod attach to multiple networks. It chains other CNI plugins (Calico for control, SR-IOV for data) and presents the pod with multiple interfaces.

A typical AI training pod:

Pod ─┬── eth0  (Calico CNI, k8s control plane)
     ├── net1  (SR-IOV CNI, VF on rail 0)
     ├── net2  (SR-IOV CNI, VF on rail 1)
     ├── ...
     └── net8  (SR-IOV CNI, VF on rail 7)

Each netN is a VF on a different rail. With rail-optimized topology, this maps GPU-N to Rail N naturally.

The pod spec includes a k8s.v1.cni.cncf.io/networks annotation that tells Multus which NetworkAttachmentDefinitions (NADs) to attach. NADs are k8s resources that describe each network.

GPU Operator (and Network Operator)

NVIDIA's GPU Operator is a Kubernetes operator that automates the entire stack required to run GPU workloads:

NVIDIA driver
Container runtime hook (so containers see the GPU)
DCGM exporter (telemetry)
Node Feature Discovery (labels nodes with GPU info)
Optional: MIG support, vGPU, time-slicing

The Network Operator is the sibling for the NIC side:

Mellanox OFED driver
RDMA shared device plugin (so pods can request RDMA resources)
IB-K8s integration (if InfiniBand)
SR-IOV Network Operator integration (for VF management)

You install both. Together they bootstrap a node from "bare hardware" to "ready to schedule RDMA + GPU pods" in minutes. Without them, you're managing drivers, CNI configs, and device plugins by hand — error-prone and slow.

The order that has to be right

Here's the dependency chain. Any link in the wrong order and you'll spend hours debugging:

Hardware enabled — BIOS VT-d / AMD-Vi on, all firmware updated
OS configured — IOMMU, hugepages, RDMA core packages installed
GPU Operator deployed — installs NVIDIA driver
Network Operator deployed — installs Mellanox OFED, sets up VFs
Multus installed — meta-CNI plugin
NetworkAttachmentDefinitions created — one NAD per rail
Pod spec uses the right annotations — Multus reads them, schedules VFs

For first-time setups: budget a week to get this right end-to-end. For repeat setups with automation: minutes.

What you should remember

PF is the physical NIC (host owns it). VF is a hardware-isolated slice (pod gets it).
SR-IOV is the PCIe mechanism. Requires BIOS + kernel + driver + Operator + CNI all configured.
Multus is what lets a pod have multiple network interfaces — needed because RDMA traffic goes through a different NIC than k8s control plane.
GPU Operator + Network Operator automate the driver / VF / plugin stack. Don't try to do this by hand at scale.
The setup chain has many steps. Most production debugging is "which step was misconfigured?"

Next: What This Curriculum Picks → — bare metal as the teaching baseline, k8s + Multus + SR-IOV as the production variant.

PF vs VF​

SR-IOV​

Multus​

GPU Operator (and Network Operator)​

The order that has to be right​

What you should remember​