Skip to main content

Design 2 — Bare Metal + Slurm + RoCE

The traditional HPC blueprint. No containers, no Kubernetes — Slurm submits jobs directly to bare-metal nodes. Highest performance ceiling and the simplest mental model, at the cost of multi-tenancy and dynamic scheduling.

Best for: Performance-sensitive single-tenant workloads. Research labs, weather, physics, training runs that pin the whole cluster. Trade-offs: No isolation between users. No container portability. Manual environment management.

Architecture

Build steps — the 15 layers