Inference Networking

AI inference is a different network problem than training. Latency-critical request/response, KV-cache movement, RAG and MCP patterns, and the inference fabric design that differs from a training fabric.

→17.1 How Inference Differs from Training

Inference is not training. Different traffic shape, different latency requirements, different topology, different scaling characteristics. Here's what changes when you go from training fabric to inference fabric.

→17.2 Prefill, Decode, and KV-Cache

The two phases of inference, what flows on the wire during each, and why KV-cache movement is the only place inference looks like training.

→17.3 RAG, MCP, and Inference Fabric Design

Retrieval-Augmented Generation, the Model Context Protocol, and how to actually design an inference fabric — what to overlap with training, what to keep separate, what to outsource.