Architecture Locked · February 28, 2026

Kedios B300 Full Network
& Topology Report

32-Node NVIDIA Blackwell Ultra B300 GPU Training Farm — Complete Planning, Architecture & Topology

Organization
Kedios
Date
February 28, 2026
Document status
Final
Server model
ASUS XA NB3I-E12
GPU
NVIDIA B300 (HGX)
Total racks
34 (32 compute + 2 network)
1

Executive Summary

The Kedios B300 farm is a 32-node NVIDIA Blackwell Ultra B300 GPU training cluster deployed within a 20 MW data center, consuming approximately 530 kW peak out of a 1 MW allocated budget (53%). The farm is purpose-built for large-scale distributed AI training — specifically AllReduce/AllGather collective operations — backed by a 1:1 non-blocking InfiniBand NDR800 fabric delivering 204.8 Tb/s bisection bandwidth.

Total GPUs
256
NVIDIA B300 (HGX Blackwell Ultra)
Peak Compute FP8
1,152
PFLOPS dense
Peak Compute NVFP4
7,680
PFLOPS sparse
GPU Memory
73.73
TB HBM3e (256 × 288 GB)
IB Bisection BW
204.8
Tb/s — 1:1 non-blocking
IB Switches
12
Q3400-RA (8 leaf + 4 spine)
Farm Peak Power
~530
kW (~53% of 1 MW budget)
Power Headroom
~470
kW remaining in 1 MW budget (47%)
Burst Safety Margin
4.6
kW below 20 kW/rack hard limit
2

Data Center Infrastructure & Power

Facility Overview

ParameterValue
Total DC power capacity20 MW
Kedios allocation1 MW (1,000 kW)
Per-cabinet hard limit20 kW (updated wall-output hard ceiling)
Cooling unitsCRAC #1 and CRAC #2 + A/C condensers
Power distributionPDU-A and PDU-B (dual-feed per row)
UPS infrastructureUPS Room #1, #2, #3 + UPS 3 & UPS 4
Power controlPCU 1, PCU 2

Per-Server Power Audit (Component-Level)

ComponentSpecQtyEachSubtotal
NVIDIA B300 GPUHGX tray, TDP 1,100 W81,100 W8,800 W
Intel Xeon 6776P CPU56-core, 660 W TDP2350 W700 W
DDR5-6400 128 GB RDIMMSamsung M321RAJA0MB2-CCP32~7 W224 W
Samsung PM9D3a NVMe U.2Gen5 1.92/3.84 TB10~10 W100 W
ConnectX-8 NIC (on-board)800 Gb/s NDR per port8~20 W160 W
BlueField-3 3220 DPU400 Gb/s NDR4002~40 W80 W
Intel X710-AT2 mgmt NICDual 10 GbE1~12 W12 W
System fans (80×80 mm)High-RPM15~18 W270 W
CPU fans (60×56 mm)6~7 W42 W
PCIe switch PEX891441~12 W12 W
VRMs / motherboard / misc175 W
Component sum10,575 W
PSU loss (80 PLUS Titanium ~95.5%)+494 W
Server at wall — sustained~14,500 W ≈ 14.5 kW
In-rack PDU cabling loss (~1%)Included in wall draw
Total server rack draw — sustained peak~14.5 kW
GPU instantaneous burst overshoot (~+6%)~15.4 kW absolute max

Rack Power vs 20 kW Hard Limit

ItemValue
20 kW hard limit per rack20.0 kW
Server sustained peak (at wall, incl. PDU)~14.5 kW
Absolute instantaneous burst~15.4 kW
Safety margin (sustained)~5.5 kW ✅
Safety margin (burst)~4.6 kW ✅
✅ Within limit. Server fits the 20 kW hard limit at all operational modes. Even at absolute burst peak (~15.4 kW) the margin is 4.6 kW — no risk of breaker trips.

Farm Zone Power Summary

ZoneRack countSustained peakBurst peakHard limit
32 compute racks (C01–C32)32~464 kW~492 kW640 kW
2 switch/fabric racks (N1–N2)2~17.6 kW~17.6 kW40 kW
Cooling / UPS overhead~20 kW~20 kW
Total farm peak34~501 kW~530 kW1,000 kW
Remaining headroom~498 kW~470 kW (47%)
3

Physical Layout & Zone Structure

ZoneCapacityAllocationSpare
32-rack compute zone32 racks32× ASUS XA NB3I-E12 B300 servers (C01–C32)0
6-rack network zone6 racksN1 (IB fabric) + N2 (Eth/UFM/OOB)4 (N3–N6 reserved)

Compute Zone — 32-Rack Grid (C01–C32)

  • Alternating cold aisle / hot aisle layout — conditioned air from CRAC #1 and CRAC #2
  • Dual-feed from PDU-A and PDU-B across all rows — N+1 power path redundancy
  • Each rack: server at U1–U9; patch panel at U11; cable management U12–U13; upper 29U empty
  • Each rack exits 12 cables: 8× IB AOC + 2× Ethernet AOC + 2× Cat6A → network tower

Network Tower — N1 & N2 Rack Layout

RackU positionHardwareU usedU spare
N1U1–U4812× Q3400-RA IB switches (4U each = 48U)48U
N2U1–U2Spectrum-4 #1 (2U)6U36U
U3–U4Spectrum-4 #2 (2U)
U5UFM Appliance (1U)
U6OOB 10 GbE Management Switch (1U)
N3–N6Empty — reserved for future scale-out0U4 × 42U
Design principle: All leaf-to-spine cables (inter-switch, intra-tower) are short passive copper DAC ≤3 m — all switches co-located in N1. Long AOC runs only cross the compute-to-network inter-zone gap (5–10 m).
4

Server Specification — ASUS XA NB3I-E12

Form factor: 9U, air-cooled (front-to-back), direct airflow  |  Quantity: 32 units (one per rack, C01–C32)

Per-Server Hardware BOM

ComponentModel / SpecQty
GPUNVIDIA Blackwell Ultra B300 (HGX tray), TDP 1,100 W8
GPU memory288 GB HBM3e per GPU (12-high stacks) = 2.304 TB / server
CPUIntel Xeon Platinum 6776P, 56 cores, 660 W TDP2
System RAMSamsung M321RAJA0MB2-CCP 128 GB DDR5-6400 RDIMM = 4 TB32
Boot SSDSamsung PM9D3a U.2 Gen5 NVMe 1.92 TB2
Data SSDSamsung PM9D3a U.2 Gen5 NVMe 3.84 TB8
IB NIC (GPU fabric)ConnectX-8 (CX8), soldered on HGX tray, 800 Gb/s NDR IB8
DPUNVIDIA BlueField-3 3220, 400 Gb/s NDR400 Ethernet2
Management NICIntel X710-AT2, dual-port 10 GbE RJ451
TPMTPM security module1
NVLink interconnectNVLink Gen 5 (HGX tray) — 1.8 TB/s per GPU, 14.4 TB/s total
PSU80 PLUS Titanium, 5+5 array (N+5 redundancy + dual-feed)10
UFM AgentSoftware only — installed on server OS1

Per-Server Performance

MetricValue
GPU compute (FP8 dense, 8-GPU)~36 PFLOPS
GPU compute (NVFP4 sparse, 8-GPU)~240 PFLOPS
Intra-server NVLink bandwidth14.4 TB/s
GPU-to-GPU latency (intra-server)< 100 ns
System memory bandwidth~600 GB/s
Total NVMe storage34.56 TB
Operating power~14.5 kW
Peak sustained wall draw~14.5 kW
Absolute burst peak~15.4 kW
Per-rack hard limit20 kW

32-Server Farm Aggregates

MetricValue
Total GPUs256 × NVIDIA B300
Total GPU memory73.73 TB HBM3e
Total system RAM128 TB DDR5
Total NVMe storage~1,106 TB
Peak compute (FP8 dense)~1,152 PFLOPS
Peak compute (NVFP4 sparse)~7,680 PFLOPS
NVLink aggregate (farm)32 × 14.4 TB/s = 460.8 TB/s
Peak power (sustained wall)~464 kW
Absolute burst peak~492 kW
Thermal output~355 kW = ~1,211,000 BTU/hr
5

Network Architecture — InfiniBand Compute Fabric

Network Ports Per Server

Port typeComponentCountSpeedTarget
CX8 IB (GPU fabric)HGX tray on-board (1 per GPU)8800 Gb/s NDRQ3400-RA Leaf switches
BF3 Ethernet (storage)BlueField-3 3220 add-in card2400 Gb/s NDR400Spectrum-4 (active-active)
X710 managementIntel X710-AT2 dual-port210 GbEOOB Management switch
Total per server12

32-Server Farm Port Totals

FabricPer server× 32 serversSpeedTarget switch
CX8 IB (compute)8256 ports800 Gb/s NDRQ3400-RA leaf (L0–L7)
BF3 Ethernet (storage)264 ports400 Gb/s NDR400Spectrum-4 #1 + #2 (active-active)
OOB management264 server + 16 infra = 80 total10 GbEOOB switch (96-port)

InfiniBand GPU Rail Assignment

Each server's 8 CX8 ports map to 8 independent GPU rails. All GPU-i ports across all 32 servers land on the same leaf switch Li, isolating GPU-to-GPU communication by rail.

Rail 0 → Leaf L0
Rail 1 → Leaf L1
Rail 2 → Leaf L2
Rail 3 → Leaf L3
Rail 4 → Leaf L4
Rail 5 → Leaf L5
Rail 6 → Leaf L6
Rail 7 → Leaf L7
Rail isolation principle: AllReduce within a rail (the dominant training pattern) traverses zero spine hops — leaf-only, sub-100 ns latency. Only cross-rail AllToAll requires the 2-hop leaf→spine→leaf path.
6

Switch Topology — 2-Tier Rail-Optimized Fat-Tree

         ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
SPINES   │  Spine S0  │ │  Spine S1  │ │  Spine S2  │ │  Spine S3  │
(×4)     │ Q3400-RA   │ │ Q3400-RA   │ │ Q3400-RA   │ │ Q3400-RA   │
         │ 144p NDR   │ │ 144p NDR   │ │ 144p NDR   │ │ 144p NDR   │
         └──────┬─────┘ └──┬─────────┘ └────────┬───┘ └─────┬──────┘
                │  (32 uplinks per leaf × 8 leaves = 256 spine ports)
   ┌────────────┼──────────┼────────────────────┼───────────┼─────┐
   │            │          │                    │           │     │
 ┌─┴───────┐ ┌──┴──────┐  ...  ┌───────────┐  ... ┌────────┴──┐  │
 │ Leaf L0 │ │ Leaf L1 │       │  Leaf L6  │      │  Leaf L7  │  │
 │ Rail 0  │ │ Rail 1  │       │  Rail 6   │      │  Rail 7   │  │
 │Q3400-RA │ │Q3400-RA │       │  Q3400-RA │      │ Q3400-RA  │  │
 └────┬────┘ └────┬────┘       └─────┬─────┘      └─────┬─────┘  │
      │           │                  │                   │
 (32 downlinks per leaf — one CX8[n] from every server)
      │           │                  │                   │
 C01..C32    C01..C32           C01..C32             C01..C32
 CX8[0]      CX8[1]             CX8[6]               CX8[7]

NVIDIA Quantum-X800 Q3400-RA Specifications

ParameterValue
Form factor4U chassis
Port count144 × 800 Gb/s NDR IB (72 OSFP, bifurcated)
Switching capacity115.2 Tb/s non-blocking
UFM management portDedicated OSFP 400 Gb/s (isolated from data ports)
SHARP versionGen 4 — in-network AllReduce acceleration
RoutingAdaptive routing + telemetry congestion control
SerDes200 Gb/s
ProtocolInfiniBand NDR, RDMA

Leaf Switch Design (×8 — L0 through L7)

ParameterValue
Count8 leaf switches (one per GPU rail)
Server downlinks32 (one CX8 port from each server on that rail)
Spine uplinks32 (8 parallel links to each of 4 spines)
Port utilization64 of 144 (44%) — room for ~32 more servers (up to ~64 total)
Downlink cable typeAOC 5–10 m (server → leaf, cross-zone)
Uplink cable typeDAC ≤3 m (leaf → spine, intra-N1)

Spine Switch Design (×4 — S0 through S3)

ParameterValue
Count4 spine switches
Leaf-facing ports64 (8 parallel uplinks × 8 leaf switches)
Server connectionsNone — pure leaf-to-leaf forwarding
Port utilization64 of 144 (44%)
Cable typeDAC ≤3 m (all intra-N1)

Fat-Tree Bandwidth & Latency

MetricValue
Server-side aggregate BW256 × 800 Gb/s = 204.8 Tb/s
Spine bisection BW204.8 Tb/s
Bisection ratio1:1 — fully non-blocking ✅
Max hops server-to-server2 (leaf → spine → leaf)
IB latency server-to-server< 200 ns
Intra-rail latency (same leaf)< 100 ns
SHARP Gen 4 AllReduce gainUp to ~50% BW reduction for typical training batch sizes
7

Network Tower — Rack N1 (InfiniBand Fabric)

Rack N1 houses all 12 × Q3400-RA switches co-located. Co-location enables all inter-switch links to use short passive DAC — no active components in the spine-to-leaf forwarding path.

Internal Wiring (Leaf → Spine)

LinkCable typeLengthCount
Each leaf → each spine (8 parallel links per pair)Passive copper DAC≤ 3 m256 DAC cables
Unique leaf-spine pairs32 pairs (8 leaves × 4 spines)

SHARP Gen 4 In-Network Computation

  • All 12 Q3400-RA support SHARP Gen 4
  • Partial gradient sums computed inside switches during AllReduce — reduces fabric BW by up to ~50%
  • Particularly impactful for LLM pretraining (GPT/LLaMA scale) with large gradient tensors
  • Orchestrated automatically by UFM Appliance — no application code changes required
8

Network Tower — Rack N2 (Ethernet, UFM & OOB)

Physical Layout

U positionHardwareRole
U1–U2NVIDIA Spectrum-4 #1AI Ethernet — 128× 400 GbE, 51.2 Tb/s
U3–U4NVIDIA Spectrum-4 #2AI Ethernet — active-active pair
U5NVIDIA UFM ApplianceIB fabric manager (hardware 1U)
U610 GbE OOB SwitchOut-of-band management (96+ ports)
U7–U42Empty (36U)Patch panels, cable trays, future expansion

Spectrum-4 AI Ethernet Fabric

ParameterValue
ASIC processTSMC 4N
Switching capacity51.2 Tb/s per switch
Port configuration128 × 400 GbE (or 64 × 800 GbE)
ProtocolEthernet + RoCE (Spectrum-X AI Ethernet)
BF3 ports per switch32 (64 total ÷ 2 switches)
Redundancy modelActive-active — each BF3 connects one port to each Spectrum-4
Inter-switch spine link2× 400 GbE direct link between Sp4 #1 and #2

UFM Appliance

ParameterValue
Form factor1U hardware appliance
Management connectionVia OOB switch (10 GbE) — not on IB data fabric
Manages12× Q3400-RA switches + 256× CX8 endpoints
UFM hardware capacityUp to 648 ports — current usage well within range
Scale headroomSupports doubling cluster to 64 servers without new UFM hardware
UFM Agent (SW)32 instances — per-server OS, no extra hardware

UFM Appliance functions: topology discovery, routing table computation, adaptive routing, SHARP orchestration, fault detection/recovery, per-port telemetry aggregation.

9

Cabling Plan

AOC requirement: Passive copper DAC only works reliably ≤3 m at NDR speeds. The inter-zone gap between compute racks and network tower is 5–15 m, requiring Active Optical Cable (AOC) on all server-to-switch links.
LinkCable typeLengthCountNotes
Server CX8 → Q3400-RA Leaf (NDR800)AOC5–10 m2568 per server × 32 servers
Server BF3 → Spectrum-4 (400 GbE NDR400)AOC5–10 m642 per server × 32 servers
Server X710 → OOB switch (OS mgmt + BMC)Cat6A copper5–10 m642 per server × 32 servers
Q3400-RA leaf → Q3400-RA spine (NDR800)Passive DAC≤ 3 m256All intra-N1
Spectrum-4 #1 ↔ #2 (inter-switch spine)DAC or AOC≤ 3 m2Intra-N2
UFM Appliance → OOB switchCat6A≤ 1 m1Intra-N2
Q3400-RA UFM mgmt port → OOB switchDAC / SFP+≤ 3 m12Intra-N1/N2

Cable Totals

TypeCount
AOC (OSFP, NDR800, 800 Gb/s)256
AOC (QSFP-DD, NDR400, 400 Gb/s)64
Cat6A copper (10 GbE)64
Passive DAC (intra-network-tower)~270
Total cable assemblies~654
10

Complete Hardware Bill of Materials

Networking Hardware

ComponentCountForm factorRole
NVIDIA Q3400-RA — Leaf84U32 CX8 downlinks + 32 spine uplinks per switch; 1:1 non-blocking
NVIDIA Q3400-RA — Spine44U64 leaf-facing ports; pure any-to-any forwarding plane
Total Q3400-RA12115.2 Tb/s each, 144 NDR800 ports each
NVIDIA Spectrum-4 (Ethernet)22U128× 400 GbE active-active, 51.2 Tb/s each
10 GbE OOB Management Switch11U96-port; 32 OS mgmt + 32 BMC/IPMI + 12 switch mgmt + 2 Spectrum-4 + 1 UFM = 80 connections
NVIDIA UFM Appliance11UCentralized IB fabric manager
UFM Agent (software)32SW onlyPer-server OS agent — no extra hardware
Grand total (hardware)16 units12 Q3400-RA + 2 Spectrum-4 + 1 OOB + 1 UFM Appliance

Rack Infrastructure

ItemCount
Compute racks (42U) — C01–C3232
Network/switch racks (42U) — N1–N22
Spare network positions — N3–N64 (reserved)
Total rack positions occupied34
11

Aggregate Bandwidth Summary

Fabric layerCalculationTotal bandwidth
IB compute — server side256 CX8 × 800 Gb/s204.8 Tb/s
IB compute — spine bisection8 leaf × 32 uplinks × 800 Gb/s204.8 Tb/s (1:1 non-blocking)
NVLink (per server)NVLink Gen 5, 1.8 TB/s × 8 GPUs14.4 TB/s
NVLink (farm aggregate)32 × 14.4 TB/s460.8 TB/s
BF3 Ethernet fabric (farm)64 × 400 Gb/s25.6 Tb/s
Spectrum-4 switching (combined)2 × 51.2 Tb/s102.4 Tb/s
OOB management80 × 10 GbE800 Gb/s
12

Farm Compute Performance

MetricPer serverFarm total (×32)
FP8 dense (GPU peak)~36 PFLOPS~1,152 PFLOPS
NVFP4 sparse (GPU peak)~240 PFLOPS~7,680 PFLOPS
GPU memory (HBM3e)2.304 TB73.73 TB
HBM3e bandwidth~8 TB/s~256 TB/s
System RAM (DDR5)4 TB128 TB
NVMe storage34.56 TB~1,106 TB
IB network BW6.4 Tb/s (8 × 800 Gb/s)204.8 Tb/s bisection
Ethernet BW (BF3)800 Gb/s25.6 Tb/s
13

Topology Diagrams

Three diagrams were produced, available in .drawio (editable) and .png (rendered) in network-topology-diagrams/.

Diagram 1
IDC Farm Planning & Connectivity
farm-idc-topology.drawio / .png
IDC Farm Network Planning Topology
End-to-end connectivity block flow — shows all major components and their interconnections:
  • External uplink path: Internet → Router → Firewall → Border Leaf
  • Compute fabric: Spectrum-4 active-active → Q3400-RA IB Fabric (leaves + spines) → 32× B300 servers
  • Management path: OOB switch → all servers (BMC + OS) + UFM Appliance → IB fabric managers
  • Zone separation clearly indicated: compute zone (right) vs. network tower (center)
Diagram 2
Farm InfiniBand Topology — Rail View
farm-ib-topology.drawio / .png
Farm InfiniBand Rail Topology
NVIDIA-style full IB topology — 3-row layout matching NVIDIA reference diagrams:
  • Row 1 (Spines): 4× Q3400-RA S0–S3 (orange)
  • Row 2 (Leaves): 8× Q3400-RA L0–L7 — each color-coded by rail (matching the 8 GPU rail colors)
  • Row 3 (Servers): 32× compute nodes C01–C32 grouped in 4 sets of 8
  • 256 rail-colored CX8→Leaf connections (colored by rail) + 32 orange spine-to-leaf connections
  • Ethernet section: Spectrum-4 #1/#2, UFM, OOB — 64 BF3→Spectrum-4 connections
Diagram 3
Single Rack Leaf/Spine Connection Detail
single-rack-topology.drawio / .png
Single Rack Topology Detail
Per-port detail for a single representative server rack (C01) — C02–C32 are identical:
  • 8× CX8 NDR800 AOC connections → 8 leaf switches (color-coded by rail, 2.5 px)
  • 8× leaf → 4× spine links (orange, 8×4 = 32 total leaf-spine connections)
  • BF3-0 → Spectrum-4 #1 | BF3-1 → Spectrum-4 #2 (purple AOC)
  • X710 Port 0 + Port 1 → OOB switch (yellow dashed = management)
  • UFM Appliance → server (blue dashed = UFM Agent SW) + UFM → Leaf L0 (blue dashed = IB fabric mgmt)
  • Connection color legend at bottom
14

Key Design Decisions & Rationale

DecisionChoiceRationale
Server per rack1 server / rack (9U, ~14.5 kW)20 kW/rack hard limit; 5.5 kW sustained margin, 4.6 kW burst margin
IB switch modelQ3400-RA (144p NDR800)NVIDIA reference for HGX B300 training; 44% port utilization = scale-out headroom
IB topology2-tier rail-optimized fat-treeGPU rail isolation: AllReduce = 0 spine hops; 1:1 non-blocking BW guarantee
Leaf count8 leaf switchesOne per GPU rail (GPU-0…GPU-7 across 32 servers)
Spine count4 spine switchesSets bisection = server BW (1:1 non-blocking guarantee)
Total Q3400-RA12 units8 leaf + 4 spine; NVIDIA validated configuration for 32-node B300 clusters
Max hops2 hopsCritical for NCCL; keeps server-to-server latency < 200 ns
Dual BF3 / serverActive-active → 2× Spectrum-4Full 800 Gb/s Ethernet per server; single-switch failure resilience
Spectrum-4 count2 (active-active)Each BF3 connects to both; per-server BW = 800 Gb/s with redundancy
UFM architecture1 appliance + 32 SW agentsUFM is centralized by design; 648-port capacity covers current + scale-out
Network tower isolationDedicated racks N1–N2Predictable compute rack power envelopes; independent power feeds and cooling for switches
Cross-zone cablesAOC 5–10 mDAC only works ≤3 m at NDR; inter-zone runs are 5–15 m
Intra-tower cablesDAC ≤3 mAll switches co-located; cheaper, lower latency than AOC for short runs
SHARP Gen 4Enabled (via Q3400-RA)Up to 50% AllReduce BW reduction at 32-node scale for LLM gradient sync
Air coolingFront-to-back (per ASUS XA design)CRAC #1 & #2 support hot/cold aisle containment; no liquid cooling required
Spare network positionsN3–N6 reservedFuture scale-out: additional spines/Spectrum-4 when cluster exceeds 32 servers
15

Risk & Mitigation Notes

SeverityRiskMitigation
HIGHLeaf switch failureUFM reroutes within seconds; training checkpointing limits job loss; cluster continues at 7/8 rail capacity
HIGHThermal runaway / CRAC failureCRAC #1 + #2 redundancy; BMC thermal throttling at GPU junction limit; B300 has independent per-chip thermal shutdown
MEDGPU burst overshoot trips breaker4.6 kW burst margin at 20 kW limit; PDU per-outlet current monitoring; BMC IPMI power capping
MEDSingle AOC cable failureCX8 port fails over via UFM adaptive routing; per-GPU rail isolation — 1 cable = 1 rail affected, not all GPUs
MEDUFM Appliance failureIB fabric continues with stale routing tables; hot-standby UFM VM on management server recommended for production
LOWSpine switch failure4 spines provide redundant paths; ECMP distributes across remaining 3 spines automatically
LOWSpectrum-4 failureActive-active pair; BF3 bonding fails over to surviving switch; storage BW halved but no outage
LOWPDU-A failureDual-feed PDU-A/PDU-B; 5+5 PSU pairs each bank to one feed; single PDU failure does not crash a server
PLANScale beyond 32 serversQ3400-RA at 44% utilization supports ~32 more servers per leaf (up to ~64 total); 4 spare N3–N6 rack positions available