Vendor Stacks & Switch Silicon — A Procurement View

May 24, 2026 · 7 min read

Staff Network Engineer · RDMA & AI Fabric

TLDR: You can build the same logical AI fabric on five different vendor stacks — Spectrum-X, Tomahawk, Jericho3-AI + DDC, Cisco Silicon One G200, and Arista EOS. They are not interchangeable. This is the procurement view: what each vendor sells, who actually buys it, and the side-by-side comparison you'll do at RFP time.

1. NVIDIA Spectrum-X — the integrated stack

Spectrum-X is the only stack on this list where one vendor sells you both ends of the wire. The switch ASIC is Spectrum-4 (51.2 Tbps, 64×800G or 128×400G), and the NIC is ConnectX-7 (400G) or ConnectX-8 (800G). The two were co-designed: the switch tags packets, the NIC reacts, and the closed loop is what NVIDIA markets as "AI-tuned out of the box."

The headline features are real. Per-packet adaptive routing lives in the switch, so a single elephant flow gets sprayed across every spine link instead of pinning one. Congestion control terminates at the NIC — when the switch sees queue buildup, ConnectX reacts in hardware, not the kernel. Hardware INT marks every packet with its hop-by-hop dwell time so you can actually see where 200 µs of tail latency came from.

Who buys it: NVIDIA DGX SuperPOD reference designs ship Spectrum-X by default. Hyperscalers running NVIDIA-only racks pick it because the full stack is supported as one SKU. Enterprises buy it because they want one throat to choke when the training run misbehaves.

Trade-off: highest list price on the page (often 2–3× a Tomahawk whitebox per port), hardest to second-source, and you are betting on NVIDIA's silicon roadmap for the life of the cluster. The performance numbers are the best in the industry — and you pay for them.

2. Broadcom Tomahawk — the open Ethernet workhorse

Tomahawk is the silicon most hyperscaler AI fabrics actually run on. Broadcom does not sell switches; they sell chips to the whitebox builders (Edgecore, Celestica, UfiSpace) and to the brand-name OEMs (Arista, Dell, Juniper). Generations matter: Tomahawk 4 is 25.6 Tbps, Tomahawk 5 is 51.2 Tbps in a single chip (64×800G), and Tomahawk 6 is on the 102.4 Tbps roadmap.

The pitch is volume economics and openness. You get the highest port density per RU, standard Ethernet (no proprietary NIC required), ECMP plus dynamic load balancing, and a chip that ships in 10× more switches than any AI-specific silicon. The toolchain assumes you'll bring your own NOS — SONiC, FBOSS, or a commercial EOS/Cumulus.

Who buys it: Meta, Microsoft, AWS, Google — most hyperscaler AI fabrics are Tomahawk underneath, even when the front-of-rack badge says someone else. Tier-2 clouds and large enterprises buy it via Arista/Dell when they want the silicon without writing their own NOS.

Trade-off: more configuration work than Spectrum-X — adaptive routing, ECN/PFC tuning, and telemetry are not "on by default" the way they are on the NVIDIA stack. You trade integration polish for lowest $/port and the broadest vendor optionality on the market.

3. Broadcom Jericho3-AI + Ramon3 — the scheduled fabric

This is the other Broadcom stack, and it is architecturally different from Tomahawk. Jericho3-AI is the leaf ASIC, Ramon3 is the fabric ASIC, and together they implement a credit-based scheduled fabric — the chassis-architecture-disaggregated-into-a-rack idea that Broadcom markets as DDC (Distributed Disaggregated Chassis).

The features that matter: 8–16 GB of HBM per chip (vs ~80 MB of on-chip buffer on Tomahawk), virtual-output-queue (VOQ) scheduling so no ingress port can ever HOL-block another, and cell-based spraying across the fabric so every flow uses every link uniformly. The headline consequence: you do not need PFC. The fabric is lossless because it is scheduled, not because it backpressures.

Who buys it: Meta's Mistral / DSF AI clusters, and hyperscalers building 32K+ GPU single-fabric pods where Tomahawk's ECMP starts to lose efficiency. Anyone whose primary pain is "I cannot tune PFC at this scale" considers Jericho3-AI.

Trade-off: scheduling adds latency. Expect an extra ~1–5 µs per hop vs a plain Ethernet switch, and a narrower ecosystem — you are committed to Broadcom DDC end-to-end, with fewer NOS choices than Tomahawk.

4. Cisco Silicon One G200 — the hybrid player

Cisco's answer is the Silicon One family. G200 is the AI-fabric flagship at 51.2 Tbps, and the architectural pitch is "one ASIC, both modes": the same chip can run ECMP-style routing or credit-scheduled (DDC-style) on a per-deployment basis. P4-programmable pipeline, integrated 112G PAM4 SerDes, and INT-XD for hop-by-hop telemetry round out the feature set.

Who buys it: customers who want one silicon family across both their general DC and their AI back-end fabric — operational consistency matters more than squeezing the last 5% of $/port. Large enterprises with deep Cisco footprints buy it through Nexus 9000-series boxes; some service providers buy it via 8000-series for the routing flexibility.

Trade-off: smaller AI deployment footprint than Spectrum-X or Tomahawk, which means the AI-tuning playbooks are younger. The silicon is competitive on paper; the operational mileage at 16K+ GPU scale is less public than the Broadcom or NVIDIA stories.

5. Arista EOS — the software layer

Arista doesn't make ASICs. They build switches around Broadcom Tomahawk (7060X, 7368X) and Broadcom Jericho (7280R, 7800R), and the differentiator is EOS — the network operating system, the CLI, the telemetry pipeline, and CloudVision for fleet management.

The features that matter to operators: a Linux-underneath EOS where every state object is queryable via eAPI (JSON-RPC) or OpenConfig, streaming telemetry as a first-class citizen, multi-agent NOS architecture so a routing crash doesn't take the box down, and a CLI that is genuinely best-in-class for debugging at 3 AM.

Who buys it: most enterprise and tier-2 cloud AI fabrics. If your team already runs Arista in the front-end DC, you'll run Arista in the back-end fabric too. They are the "Cisco of AI fabrics" for operations — you don't buy them for unique hardware, you buy them for the software and the support contract.

Trade-off: you pay a meaningful software premium over the same Tomahawk silicon in a whitebox running SONiC. The math is: how much is your NetOps team's time worth?

6. Side-by-side

Vendor stack	Scale sweet spot	LB approach	Telemetry	Hyperscaler footprint	$/port (relative)
NVIDIA Spectrum-X	1K–16K GPUs, NVIDIA-only	Per-packet adaptive (switch) + NIC CC	Hardware INT, AI-tuned	Low (NVIDIA-aligned shops)	High
Broadcom Tomahawk 5	1K–32K GPUs, BYO-NOS	ECMP + DLB	In-band telemetry, BYO collector	Very High (Meta, MSFT, AWS, GOOG)	Low
Broadcom Jericho3-AI + DDC	16K–100K+ GPUs, single fabric	Credit-scheduled, cell spray, no PFC needed	VOQ-aware, deep counters	High (Meta Mistral)	Medium-High
Cisco Silicon One G200	1K–16K GPUs, Cisco shops	Hybrid (ECMP or scheduled)	INT-XD, P4-programmable	Low–Medium	Medium
Arista EOS (on Tomahawk/Jericho)	1K–16K GPUs, enterprise + tier-2	Whatever the underlying ASIC supports	Streaming telemetry, CloudVision	Medium	Medium-High

The anti-pattern: single-vendor lock-in across the back-end

Betting your entire back-end fabric on one vendor's roadmap is a 24-month risk. The AI silicon market is consolidating — chips, NOS, and NIC roadmaps all slip. Multi-plane lets you mix vendors per plane: rail-plane A on Spectrum-X, rail-plane B on Tomahawk, storage plane on Arista/Jericho. Different blast radius, different procurement lever, same logical fabric. Use it.

What to remember

	Concept	One-liner
🥇	Spectrum-X is integrated	Highest performance out of the box, highest price, hardest to second-source.
🏗️	Tomahawk runs the hyperscalers	51.2 Tbps per chip, lowest $/port, most NOS options — but you tune it.
📦	Jericho3-AI + DDC = scheduled fabric	Deep buffers, no PFC, huge pods — pay ~1–5 µs per hop and lock to Broadcom.
🔄	Cisco G200 is the hybrid bet	One silicon for DC + AI; younger AI playbooks; good if you're already Cisco.
💻	Arista is software, not silicon	Best operator experience on someone else's chips — you pay an EOS premium.
⚠️	Don't pick one stack for everything	Multi-plane = multi-vendor = lower procurement and roadmap risk.

1. NVIDIA Spectrum-X — the integrated stack​

2. Broadcom Tomahawk — the open Ethernet workhorse​

3. Broadcom Jericho3-AI + Ramon3 — the scheduled fabric​

4. Cisco Silicon One G200 — the hybrid player​

5. Arista EOS — the software layer​

6. Side-by-side​

The anti-pattern: single-vendor lock-in across the back-end​

What to remember​