AMD Instinct MI350X — MRC + SRv6 Fabric Connectivity

AMD Pensando Pollara 400 NIC · 8× 400G per node · 8-plane MRC fabric · Arista 7060XE7 leaf+spine · OCP MRC Rev 1.0

AMD MI350X Node — Internal topology
AMD Instinct MI350X Platform node EPYC Turin CPU 1 EPYC Turin CPU 2 PCIe Gen5 Switch fabric MI350X HBM3e MI350X HBM3e MI350X HBM3e MI350X HBM3e MI350X HBM3e MI350X HBM3e MI350X HBM3e MI350X HBM3e Infinity Fabric — 1,075 GB/s all-to-all intra-node AMD Pensando Pollara 400 (8× NICs, 1:1 GPU:NIC) Pollara 400G NIC 1 Pollara 400G NIC 2 Pollara 400G NIC 3 Pollara 400G NIC 4 Pollara 400G NIC 5 Pollara 400G NIC 6 Pollara 400G NIC 7 Pollara 400G NIC 8 PCIe G5 x16 each MRC engine in hardware Pollara P4 ASIC SW-programmable 32-bit EV (UDP src port + IPv6 flow label) 128–256 EVs/QP startup 8 × 400G uplinks → 8 plane leaf switches (7060XE7) NIC n → plane n Intra-node: Infinity Fabric (all-to-all 1,075 GB/s) RCCL uses IF for same-node GPU collectives — fabric never sees intra-node traffic ROCm / RCCL collective framework topology-aware: IF for intra-node · MRC Pollara for inter-node Pollara 400 port configs: 1×400G | 2×200G | 4×100G | 4×50G Multi-plane: 8×100G (4 ports × 2 NICs) or 4×200G (2 ports × 2 NICs) also supported Memory per node MI350X: 192 GB HBM3e per GPU → 1.5 TB total MI355X: 288 GB per GPU → 2.3 TB total · 8 TB/s BW per GPU Node fabric bandwidth: 8 × 400G = 3.2 Tbps Paper: NCCL over MRC at 42K GPUs = 92 GB/s per NIC (96% peak) · MI400 + Vulcano 800G → 2.4 Tbps/GPU Vulcano: PCIe Gen6 · 4×200G or 8×100G configs · qualifying for MI400 now
AMD vs Nvidia — MRC node comparison
Feature AMD MI350X Nvidia B300
GPU 8× MI350X (CDNA4) 8× B300 (Blackwell)
Scale-up fabric Infinity Fabric

1,075 GB/s
NVLink 5

1.8 TB/s
Scale-up silicon On-package / PCIe switch Dedicated NVSwitch ×4
Scale-out NIC Pollara 400

400 Gbps
ConnectX-8

800 Gbps
NIC:GPU ratio 1:1 (8 NICs/node) 1:1 (8 NICs/node)
NIC host bus PCIe Gen5 x16 PCIe Gen5 x16
MRC support Yes P4 ASIC FW Yes HW native
SRv6 uN SID Yes Yes
Packet spray Yes (Pollara HW) Yes (CX-8 HW)
Coll. lib. RCCL + MRC shim NCCL + MRC plugin
Fabric BW/node 3.2 Tbps 6.4 Tbps
Arista 7060XE7 Compatible Compatible
Next-gen NIC Vulcano 800G

MI400, PCIe Gen6
CX-9 (future)
Key diff PCIe switch connects GPUs + NICs. No dedicated NVSwitch-equivalent. NVSwitch dedicated silicon for GPU mesh.
MRC is fabric-agnostic. Both implement OCP MRC Rev 1.0: 32-bit EV striped across UDP src port + IPv6 flow label; 128–256 EVs per QP split equally across planes; SRv6 uN SID static source routing; PFC disabled (lossy Ethernet); dynamic routing disabled. The Arista 7060XE7 sees identical traffic from both. Mixed AMD+Nvidia clusters on the same MRC fabric are architecturally valid.
MRC SRv6 fabric — AMD MI350X node → 8 planes → Arista 7060XE7
MI350X Node Pollara NIC 1 → P1 plane 1 · 400G · SRv6 uN Pollara NIC 2 → P2 plane 2 · 400G · SRv6 uN Pollara NIC 3 → P3 plane 3 · 400G · SRv6 uN Pollara NIC 4 → P4 plane 4 · 400G · SRv6 uN Pollara NIC 5 → P5 plane 5 · 400G · SRv6 uN Pollara NIC 6 → P6 plane 6 · 400G · SRv6 uN Pollara NIC 7 → P7 plane 7 · 400G · SRv6 uN Pollara NIC 8 → P8 plane 8 · 400G · SRv6 uN MRC: 32-bit EV · 128–256/QP PFC off · lossy · no dyn. routing Leaf P1 7060XE7 800G ports ECN mark stateless reads uN SID ↑ spine P1 no cross- plane link Leaf P2 7060XE7 800G ports ECN mark stateless reads uN SID ↑ spine P2 Leaf P3 7060XE7 800G ports ECN mark stateless reads uN SID ↑ spine P3 Leaf P4 7060XE7 800G ports ECN mark stateless ↑ spine P4 Leaf P5 7060XE7 800G ports ECN mark stateless ↑ spine P5 Leaf P6 7060XE7 800G ports ECN mark stateless ↑ spine P6 Leaf P7 7060XE7 800G ports ECN mark stateless ↑ spine P7 Leaf P8 7060XE7 800G ports ECN mark stateless reads uN SID forwards ↑ spine P8 Planes fully independent 400G QSFP per plane NACK→re-steer (swap EV)
Pollara 400 — MRC capabilities
Intelligent packet spray — distributes packets across all 8 planes per NIC, 32-bit EV striped across UDP src port + IPv6 flow label · 128–256 EVs per QP, equally split across planes
In-order delivery — every packet carries RDMA virtual address + remote key → NIC writes directly to correct GPU memory location regardless of arrival order
Selective retransmission — SACK indicates precisely which packets arrived · selective retransmit only missing packets · packet trimming: payload stripped, header forwarded as loss signal
Path-aware congestion avoidance — ECN CE bit at leaf (not last hop) → CNP/NACK → Pollara swaps EV to cleaner plane · PFC disabled, runs lossy
Failover — packet loss (not trim) = path failure → EV retired immediately · background probes resurrect recovered paths
SW-programmable P4 ASIC — MRC transport runs in NIC firmware, updatable without hardware swap as spec evolves
SRv6 uN SID encoding — 32-bit EV maps algorithmically to SRv6 address template → specific switch path encoded in uN uSIDs · no dynamic routing in fabric
RCCL + MRC shim — OCP MRC Rev 1.0 ibverbs shim; RCCL workloads run over MRC with no source changes
Port config flexibility: Each Pollara 400 supports 1×400G, 2×200G, or 4×100G. In a multi-plane setup, a single NIC can cover 4 planes at 100G each, or 2 NICs cover all 8 planes at 100G — enabling MRC without a full 8-NIC configuration where cost is a concern.
Vulcano 800G (MI400 — upcoming)
PCIe Gen6 host interface — removes PCIe5 bandwidth bottleneck
800 Gbps per NIC — matches Nvidia CX-8
4×200G or 8×100G multi-plane configs
Up to 2.4 Tbps per GPU (3 NICs per GPU)
With 3× Vulcano NICs per MI400 GPU, AMD moves to a different plane-to-NIC mapping than Nvidia's 1:1. The Arista 7060XE7 fabric topology is unchanged — only the server-side cabling discipline differs.
8
Pollara NICs/node
3.2
Tbps fabric BW
8
MRC planes
400G
per uplink
P4
ASIC (SW prog.)
0
ECMP · no dyn. routing