Design 2 — Bare Metal + Slurm + RoCE
The traditional HPC blueprint. No containers, no Kubernetes — Slurm submits jobs directly to bare-metal nodes. Highest performance ceiling and the simplest mental model, at the cost of multi-tenancy and dynamic scheduling.
Best for: Performance-sensitive single-tenant workloads. Research labs, weather, physics, training runs that pin the whole cluster. Trade-offs: No isolation between users. No container portability. Manual environment management.
After this page, you'll be able to
- Place the bare-metal + Slurm + RoCE pattern — when no containers and no Kubernetes is the right call, and why national labs run it for performance-sensitive single-tenant jobs.
- Name the trade-offs you're accepting — no inter-user isolation, no container portability, and manual environment management in exchange for the highest performance ceiling and simplest mental model.
- Walk the 15-layer build — from bare-metal nodes up through RoCE and Slurm job submission, using the interactive architecture and build-step diagrams.
Architecture
Build steps — the 15 layers
When to pick this design
Pick this when:
- You're a classic HPC shop — research lab, weather, physics — where one big synchronous job pins the whole cluster and raw performance is the goal.
- You want the simplest mental model and the highest performance ceiling, with no container runtime or CNI between your code and the NIC.
- A single team owns the hardware and manages the environment by hand (modules, MPI builds, drivers) without needing self-service.
Avoid it when:
- You need multi-tenancy or isolation between users — Slurm partitions schedule jobs, but bare metal gives you no container-level isolation. Reach for Design 1.
- You want portable, reproducible environments — without containers, every node's software stack is yours to keep in sync manually.
- You also run inference, dashboards, or CI that want cloud-native orchestration — that's the hybrid case (Design 5).
💡 What you should remember
| # | Concept | Why it matters | |
|---|---|---|---|
| 1 | 🏔️ | No containers, no Kubernetes — Slurm schedules straight onto bare metal | Stripping the orchestration layers gives the highest performance ceiling and the fewest things between your job and the RoCE NIC. |
| 2 | 👤 | Single-tenant by design | Slurm queues jobs, but there's no container isolation — one big run pins the cluster, which is exactly the model national labs want. |
| 3 | 🔧 | Environment management is manual | Drivers, MPI builds, and modules are yours to keep consistent across nodes; you trade portability for control. |
| 4 | 📐 | Simplest stack to reason about | Fewer layers means fewer places to break — when peak performance matters more than flexibility, that simplicity is the feature. |
What's next
- Design 1 — Kubernetes + SR-IOV + RoCE — the flexible multi-tenant alternative.
- Design 3 — Kubernetes + Physical NIC + RoCE — simpler K8s, no SR-IOV.
- Design 4 — Bare Metal + MPI + RoCE — drop Slurm for raw
mpirun. - Design 5 — Hybrid: K8s + Slurm + RoCE — keep Slurm, add K8s alongside.