How Inference Differs from Training
Inference is not training. Different traffic shape, different latency requirements, different topology, different scaling characteristics. Here's what changes when you go from training fabric to inference fabric.
Prefill, Decode, and KV-Cache
The two phases of inference, what flows on the wire during each, and why KV-cache movement is the only place inference looks like training.
RAG, MCP, and Inference Fabric Design
Retrieval-Augmented Generation, the Model Context Protocol, and how to actually design an inference fabric — what to overlap with training, what to keep separate, what to outsource.