Design 3 — Kubernetes + Physical NIC + RoCE
Kubernetes orchestration, but without SR-IOV. Pods share the host's physical NIC via the standard CNI. Lower setup complexity than Design 1, lower performance ceiling because all pods on a node funnel through one NIC.
Best for: Small-to-mid Kubernetes-native AI clusters where you don't need per-pod NIC isolation, or hardware that doesn't support SR-IOV. Trade-offs: No per-pod traffic isolation. Multi-pod nodes share NIC bandwidth and queue resources.
After this page, you'll be able to
- Walk the 15-layer stack for this design and name where it breaks — Kubernetes the same as Design 1, but with the SR-IOV and Multus layers removed and pods funneling through one shared physical NIC.
- See the trade you're making for simplicity — dropping SR-IOV and Multus cuts setup complexity, but every pod on a node shares the same NIC bandwidth and queues, so there's no per-pod isolation.
- Decide when this beats Design 1 — pick it when you want K8s orchestration but never need to slice a NIC, and skip it the moment many small pods must share one interface and start fighting over it.
Architecture
Build steps — the 15 layers
When to pick this design
Pick this when:
- You want Kubernetes orchestration — declarative scheduling, rolling upgrades, self-service — but don't need to partition a NIC across pods.
- Your nodes run one (or a few large) RDMA pods that can own the whole physical NIC, so there's nothing to slice.
- Your hardware doesn't support SR-IOV, or you want a simpler stack with fewer layers than Design 1 to operate and debug.
Avoid it when:
- Many small pods must share one NIC on the same node — without SR-IOV they contend for the same bandwidth and queues, with no isolation. Reach for Design 1.
- You need per-tenant traffic isolation or QoS guarantees between pods on a node.
- You're chasing the absolute performance ceiling for one big job — bare metal (Design 2 or 4) removes the container and CNI layers entirely.
💡 What you should remember
| # | Concept | Why it matters | |
|---|---|---|---|
| 1 | 🚪 | One physical NIC per node, shared by all pods | Without SR-IOV there's nothing to slice — pods reach RoCE through the host's single interface, which is simpler to set up. |
| 2 | 🤝 | No per-pod isolation | Multi-pod nodes share NIC bandwidth and queue resources, so a noisy pod can starve its neighbors. |
| 3 | 🪜 | Fewer layers than Design 1 | Dropping the SR-IOV Device Plugin and Multus removes whole layers — less to configure, less to break. |
| 4 | 🎯 | Right when you want K8s but not NIC partitioning | If a node runs one big pod that owns the NIC, slicing buys nothing; the moment many pods share it, you've outgrown this design. |
What's next
- Design 1 — Kubernetes + SR-IOV + RoCE — add SR-IOV when you need per-pod NIC isolation.
- Design 2 — Bare Metal + Slurm + RoCE — drop K8s for classic HPC performance.
- Design 4 — Bare Metal + MPI + RoCE — minimal lab setup.
- Design 5 — Hybrid: K8s + Slurm + RoCE — run K8s and Slurm side by side.