8.0 Understanding RDMA
Visual walk-through of RDMA for TCP network engineers — kernel bypass, memory registration, the three operations, queue pairs, and hardware ACKs.
8.1 What RDMA Actually Does
RDMA is a technique, not a protocol. What it does, the TCP path vs the RDMA path, the three operations (SEND, READ, WRITE), and what the CPU actually does.
8.2 Verbs, Queue Pairs, Memory Regions
The RDMA API the app actually programs against. Queue pairs, memory regions, work requests, completion queues — and how they all fit together.
8.3 RDMA in Production — Reliability, Setup, Errors
How RC reliability works in NIC hardware (PSN, ACK/NAK, retransmit, RNR). What librdmacm events you'll see in logs. The WR features that show up in NCCL traces (SGE, inline, signaled/unsignaled, WRITE_WITH_IMM). And the completion error codes operators triage in production.
8.4 ibv_devinfo Decoded — Your RDMA Device Inventory
Every field of ibv_devinfo explained, the vendor_part_id table (ConnectX-5 through ConnectX-8), why RDMA MTU is not Ethernet MTU, GID/GUID derivation, and what sm_lid=0 actually means on a RoCE box. The page you'll bookmark for the next time someone asks "what NIC is in that host?"