SR-IOV Mechanics
How SR-IOV actually works — Physical Functions, Virtual Functions, IOMMU, what the BIOS / kernel / driver need to agree on, and the most common misconfigurations.
Multus and Multi-NIC Pods
How a Kubernetes pod gets multiple network interfaces — one for k8s control plane, eight for RDMA rails. NetworkAttachmentDefinitions, pod annotations, and the YAML you'll actually write.
NCCL and GPUDirect Configuration
How NCCL picks NICs, how GPUDirect RDMA makes NIC ↔ GPU memory transfers zero-copy, and the environment variables that decide whether training runs at full speed or half.