Kedios Infrastructure Report

Kedios B300 — Full Network & Topology Report

March 30, 202632-Node NVIDIA Blackwell Ultra B300 GPU Farm with 72-Node-Standard-Aligned Network Baseline

Kedios B300 — Full Network & Topology Report

Organization: Kedios

Date: March 30, 2026

Scope: 32-Node NVIDIA Blackwell Ultra B300 GPU Farm with 72-Node-Standard-Aligned Network Baseline

Status: Source Architecture Refresh Locked · Diagram Redraw Pending


1. Executive Summary

The Kedios B300 package remains a 32-node NVIDIA Blackwell Ultra B300 GPU training cluster inside a 20 MW facility, but the network architecture, BOM framing, and management-plane narrative are now refreshed to align with the 72-node B300 standard baseline. The compute population stays fixed at 32 servers / 256 GPUs. The standards-aligned layer now governs how the report presents InfiniBand control, Ethernet side fabric, border, storage-network, and management components.

This means the report intentionally carries two truths at once:

  1. the current deployed compute scope is still a 32-node farm
  2. the accepted network baseline is now a 72-node-standard-aligned design pattern

The compute fabric remains a 1:1 non-blocking Q3400-RA InfiniBand fat-tree delivering 204.8 Tb/s of server-side bisection bandwidth for the currently deployed 32-node farm. At the same time, the Ethernet, storage-network, management, and border layers are no longer described as a minimal custom add-on. They are now expressed as the standard-aligned design basis for the current package.

The platform remains optimized for collective-heavy distributed training and post-training workflows. Selective inference and checkpoint-adjacent workloads remain supportable, but they are not the sizing basis for the fabric or rack-power architecture.

Farm at a Glance

ParameterValue
Server modelASUS XA NB3I-E12 (HGX Blackwell Ultra B300, 9U air-cooled)
Current deployed servers32
Current deployed GPUs256 × NVIDIA B300
Standard network reference capacity72 nodes / 576 GPUs
Total GPU memory (current deployment)73.73 TB HBM3e (256 × 288 GB)
Peak compute (FP8 dense, current deployment)~1,152 PFLOPS
Peak compute (NVFP4 sparse, current deployment)~7,680 PFLOPS
IB fabric bisection BW204.8 Tb/s (1:1 non-blocking)
IB fabric core8 × Q3400-RA leaf + 4 × Q3400-RA spine
Retained Ethernet side fabric2 × NVIDIA Spectrum-4 (active-active)
Standard-aligned added layersSN5610 ×6, SN4700 ×4, SN2201 ×17, UFM ×2
Contracted rack package34 occupied racks
Per-rack hard limit20 kW
Current 34-rack hard ceiling680 kW
Facility allocation1.5 MW

The earlier ~530 kW all-in peak remains the last fully validated package-level figure from the pre-refresh network stack. Because the refreshed baseline adds SN5610, SN4700, SN2201, and a second UFM node, this revision separates validated compute-side power anchors from the refreshed network-side BOM instead of overstating a new all-in watt total before the final vendor power sheets are attached.


2. Data Center Infrastructure & Power

Facility Overview

ParameterValue
Total DC power capacity20 MW
Kedios allocation1.5 MW (1,500 kW)
Per-cabinet hard limit20 kW (wall-output hard ceiling)
Current 34-rack hard ceiling680 kW
Building levelsMulti-level (Level 1, 2, 3)
Cooling unitsCRAC / CRAH fleet with N+1 target coverage
Power distributionPDU-A and PDU-B (dual-feed per row)
UPSUPS Room #1, #2, #3 + UPS 3 & UPS 4
Power controlPCU 1, PCU 2

Per-Server Power Audit (Component-Level)

ComponentModel / SpecQtyPower eachSubtotal
NVIDIA B300 GPUHGX tray, TDP 1,100 W81,100 W8,800 W
Intel Xeon 6776P CPU56-core, 660 W TDP2350 W700 W
DDR5-6400 128 GB RDIMMSamsung M321RAJA0MB2-CCP32~7 W224 W
Samsung PM9D3a NVMe U.2Gen5, 1.92/3.84 TB10~10 W100 W
ConnectX-8 NIC (HGX on-board)800 Gb/s NDR8~20 W160 W
BlueField-3 3220 DPU400 Gb/s NDR4002~40 W80 W
Intel X710-AT2 mgmt NICDual 10 GbE1~12 W12 W
System fans (80×80 mm)High-RPM15~18 W270 W
CPU fans (60×56 mm)6~7 W42 W
PCIe switch PEX891441~12 W12 W
VRMs / motherboard / misc175 W
Component sum10,575 W
PSU loss (80 PLUS Titanium ~95.5%)+4.5%+494 W
Server draw at wall (sustained)~14,500 W ≈ 14.5 kW
In-rack PDU cabling loss (~1%)+111 W
Total server rack draw — sustained peak~14.5 kW
GPU instantaneous burst overshoot (~+6%)~15.4 kW absolute max

Server Rack vs 20 kW Hard Limit

ItemValue
20 kW hard limit per rack20.0 kW
Server sustained peak (at wall, incl. PDU loss)~14.5 kW
Absolute instantaneous burst~15.4 kW
Safety margin (sustained)~5.5 kW ✅
Safety margin (burst)~4.6 kW ✅

Server fits within the 20 kW hard limit. Sustained training peak of ~14.5 kW leaves 5.5 kW margin. Even at absolute burst (~15.4 kW) the margin is 4.6 kW — no risk of breaker trips.

Validated Power Anchors

LayerBasisValue
32 compute racks — sustained32 × ~14.5 kW~464 kW
32 compute racks — burst peak32 × ~15.4 kW~492 kW
Compute-rack hard ceiling32 × 20 kW640 kW
Current 34-rack package hard ceiling34 × 20 kW680 kW
Facility allocation envelopeProject allocation1.5 MW

Network-Side Power Refresh Status

The refreshed network baseline adds hardware families that were not part of the earlier all-in package roll-up. The current source set supports counts, roles, and topology placement intent for these elements, but it does not yet support a final consolidated watt total for every added model.

Component familyCount now in scopePower status in this source revision
Q3400-RA12Existing repo-backed estimate remains available
Spectrum-42Existing repo-backed estimate remains available
UFM2Count locked; refreshed dual-node roll-up pending final BOM watt sheet
SN56106Count locked; per-device wattage pending final model sheet
SN47004Count locked; per-device wattage pending final model sheet
SN220117Count locked; per-device wattage pending final model sheet

Use 1.5 MW as the facility-allocation answer and 680 kW as the current rack-constrained ceiling under the unchanged 20 kW/rack rule. Do not promote a new all-network aggregate watt number until the refreshed procurement BOM carries the missing per-model power entries.


3. Physical Layout & Zone Structure

Zone Allocation

ZoneCurrent rule
32-rack compute zoneUnchanged — 32× ASUS XA NB3I-E12 B300 servers (C01–C32)
Network / services envelopeThe old N1 occupied + N2 occupied + N3–N6 empty description is no longer authoritative. The refreshed topology now uses the broader network/services placement envelope for IB, Spectrum-4, SN5610, SN4700, SN2201, and dual-UFM layers.

The project still remains a 34-rack package. The updated draw.io sources must finalize the physical placement of the refreshed network layers instead of reusing the previous assumption that only N1 and N2 are populated.

Compute Zone — 32-Rack Grid (C01–C32)

  • Racks numbered C01 through C32 — pure compute, 1 server per rack
  • Alternating cold aisle / hot aisle layout — conditioned air from CRAC #1 and CRAC #2
  • Dual-feed from PDU-A and PDU-B across all rows (N+1 power path redundancy)
  • Each rack: server occupies bottom U1–U9 (9U); patch panel at U11; cable management U12–U13; upper 29U intentionally empty
  • Each rack exits 12 cables: 8× IB AOC + 2× Ethernet AOC + 2× Cat6A copper to network tower

Network Zone — Standard-Aligned Placement Envelope

The compute-side physical rule still holds: long server-to-network runs cross the inter-zone gap and therefore use AOC or copper management cabling rather than passive DAC. What changes in this revision is the network-layer occupancy narrative:

  • the Q3400-RA core remains the live compute-fabric anchor
  • the Spectrum-4 pair remains the live BF3 side fabric
  • the management, border, and storage-network layers are now part of the accepted standard-aligned design basis
  • the prior statement that N3–N6 are simply empty spare racks is withdrawn

Until the diagram refresh is completed, treat N1–N6 as the logical network/services placement envelope rather than as a final U-by-U rack map.


4. Server Specification — ASUS XA NB3I-E12

Model: ASUS XA NB3I-E12

Form factor: 9U, air-cooled (front-to-back), direct airflow

Quantity deployed: 32 units (one per rack, C01–C32)

Per-Server Hardware Bill of Materials

ComponentModel / SpecQty per server
GPUNVIDIA Blackwell Ultra B300 (HGX tray), TDP 1,100 W each8
GPU memory288 GB HBM3e per GPU (12-high HBM3e stacks) = 2.304 TB per server
CPUIntel Xeon Platinum 6776P (56 cores, 660 W TDP)2
System RAMSamsung M321RAJA0MB2-CCP 128 GB DDR5-6400 RDIMM32 (= 4 TB)
Boot SSDSamsung PM9D3a U.2 Gen5 NVMe 1.92 TB2
Data SSDSamsung PM9D3a U.2 Gen5 NVMe 3.84 TB8
IB NIC (GPU fabric)ConnectX-8 (CX8) soldered on HGX tray, 800 Gb/s NDR IB8
DPUNVIDIA BlueField-3 3220, 400 Gb/s NDR400 Ethernet2
Management NICIntel X710-AT2, dual-port 10 GbE RJ451
UFM AgentSoftware only — installed on OS1 (SW only)
Interconnect fabricNVLink 5 (intra-GPU), 14.4 TB/s totalHGX tray
PSU80 PLUS Titanium, 5+5 array (N+5 redundancy)10

Per-Server Performance

MetricValue
GPU compute (FP8 dense, 8-GPU)~36 PFLOPS
GPU compute (NVFP4 sparse, 8-GPU)~240 PFLOPS
Intra-server NVLink BW14.4 TB/s (NVLink Gen 5, 1.8 TB/s × 8 GPUs)
GPU-to-GPU latency (intra-server)< 100 ns
System memory bandwidth~600 GB/s
Total NVMe storage34.56 TB (2 × 1.92 TB boot + 8 × 3.84 TB data)
Operating power draw~14.5 kW
Peak draw at wall (sustained)~14.5 kW
Absolute instantaneous burst~15.4 kW

32-Server Farm Aggregate

MetricValue
Total GPUs256 × NVIDIA B300
Total GPU memory73.73 TB HBM3e
Total system RAM128 TB DDR5
Total NVMe storage~1,106 TB
Peak compute (FP8 dense)~1,152 PFLOPS
Peak compute (NVFP4 sparse)~7,680 PFLOPS
Intra-node NVLink aggregate32 × 14.4 TB/s = 460.8 TB/s
Operating power~355 kW
Peak power (sustained wall)~464 kW
Absolute burst peak~492 kW
Thermal output per server~14.5 kW = ~49,500 BTU/hr
Farm thermal output~355 kW = ~1,211,000 BTU/hr

5. Network Architecture — InfiniBand Compute Fabric

Network Ports Per Server

Port typeComponentCountSpeedFabric
CX8 IB (GPU fabric)HGX tray on-board8800 Gb/s NDRIB compute fabric → Q3400-RA leaf
BlueField-3 DPUAdd-in card2400 Gb/s NDR400Storage/security Ethernet → Spectrum-4
Intel X710-AT2Management NIC210 GbEOOB management → OOB switch
Total per server12

32-Server Farm Port Totals

FabricPer server× 32 serversSpeedTarget layer
CX8 IB (compute)8256 ports800 Gb/s NDRQ3400-RA leaf (×8)
BF3 Ethernet (storage/DPU)264 ports400 Gb/s NDR400Spectrum-4 pair and standard-aligned Ethernet layer
Server management264 server-facing ports10 GbESN2201-based management layer

The old 80 total ports on one generic OOB switch figure is retired in this revision. Management, border, and control-plane connectivity now follow the standard-aligned SN2201 + SN4700 + dual-UFM model rather than a single generic 96-port switch story.

InfiniBand Rail Assignment

Each server's 8 CX8 ports are assigned to 8 independent GPU rails:

CX8 portRailLeaf switchGPU
CX8[0]Rail 0Leaf L0GPU-0
CX8[1]Rail 1Leaf L1GPU-1
CX8[2]Rail 2Leaf L2GPU-2
CX8[3]Rail 3Leaf L3GPU-3
CX8[4]Rail 4Leaf L4GPU-4
CX8[5]Rail 5Leaf L5GPU-5
CX8[6]Rail 6Leaf L6GPU-6
CX8[7]Rail 7Leaf L7GPU-7

Rail isolation principle: GPU-i on server X and GPU-i on server Y share the same leaf switch. This means AllReduce within a rail (the dominant training communication pattern) traverses zero spine hops — leaf-only latency. Only cross-rail operations (AllToAll) require the 2-hop leaf→spine→leaf path.


6. Switch Topology — 2-Tier Rail-Optimized Fat-Tree

Topology Design

                    ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
  SPINES (×4)       │  Spine   │ │  Spine   │ │  Spine   │ │  Spine   │
                    │   S0     │ │   S1     │ │   S2     │ │   S3     │
                    │Q3400-RA  │ │Q3400-RA  │ │Q3400-RA  │ │Q3400-RA  │
                    │144p NDR  │ │144p NDR  │ │144p NDR  │ │144p NDR  │
                    └──┬───────┘ └──┬───────┘ └──┬───────┘ └──┬───────┘
                       │ (32 links per spine, 8 per leaf)      │
        ┌──────┬────────┼──────┬─────┬──────┬────┼────┬────────┼──────┐
        │      │        │      │     │      │    │    │        │      │
  ┌─────┴──┐ ┌─┴──────┐ ... ┌──┴─────┴──┐ ... ┌─┴────┴──┐ ┌──┴─────┐
  │ Leaf L0│ │ Leaf L1│     │  Leaf L6  │     │  Leaf L7│ │        │
  │Rail 0  │ │Rail 1  │     │  Rail 6   │     │  Rail 7 │ │        │
  │Q3400-RA│ │Q3400-RA│     │  Q3400-RA │     │Q3400-RA │ │        │
  └────┬───┘ └────┬───┘     └─────┬─────┘     └────┬────┘ └────────┘
       │          │               │                 │
  (32 downlinks per leaf — one CX8[n] port from each server)
       │          │               │                 │
  C01..C32   C01..C32        C01..C32           C01..C32
  CX8[0]     CX8[1]          CX8[6]             CX8[7]

Q3400-RA Specifications

ParameterValue
Form factor4U chassis
Port count144 × 800 Gb/s NDR IB (72 OSFP connectors, each bifurcated)
Switching capacity115.2 Tb/s non-blocking
UFM management portDedicated in-band OSFP 400 Gb/s (isolated from data ports)
Adaptive featuresSHARP Gen 4, adaptive routing, telemetry congestion control
SerDes200 Gb/s
RDMA supportYes

Leaf Switch Design (×8 — L0 through L7)

ParameterValue
Count8 leaf switches
Each serves1 GPU rail
Server downlinks32 (one CX8 port from each server on that rail)
Spine uplinks32 (8 parallel links to each of 4 spines)
Port utilization64 of 144 (44%) — room for ~32 more servers (up to ~64 total)
Cable type (down)AOC 5–10 m (server → leaf, cross-zone)
Cable type (up)DAC ≤3 m (leaf → spine, intra-N1)

Spine Switch Design (×4 — S0 through S3)

ParameterValue
Count4 spine switches
Leaf-facing ports64 (8 parallel uplinks × 8 leaf switches)
Server connectionsNone (pure leaf-to-leaf forwarding plane)
Port utilization64 of 144 (44%)
Cable typeDAC ≤3 m (all intra-N1)

Fat-Tree Bandwidth & Latency

MetricValue
Server-side aggregate BW256 × 800 Gb/s = 204.8 Tb/s
Spine bisection BW8 leaf × 32 uplinks × 800 Gb/s = 204.8 Tb/s
Bisection ratio1:1 — fully non-blocking
Max hops server-to-server2 (leaf → spine → leaf)
IB latency server-to-server< 200 ns (2-hop NDR)
Intra-rail latency (same leaf)< 100 ns (single hop)
SHARP Gen 4 AllReduce gainUp to ~50% BW reduction for typical batch sizes

The 8-leaf / 4-spine Q3400 topology remains the live compute fabric for the current 32-node farm. The 72-node standard alignment changes the surrounding Ethernet, management, border, and storage-network layers rather than the rail mapping or the 32-node compute-population count.


7. Network Tower — Rack N1 (InfiniBand Fabric)

Physical Content — 12 × Q3400-RA (48U total)

All 12 Q3400-RA switches are co-located in the network tower (Rack N1, or split across N1 IB bay and N2 upper bay as needed for physical depth). Co-location enables all inter-switch links to be short passive DAC, eliminating active components from the spine-to-leaf path.

Leaf-to-Spine Internal Wiring

LinkCable typeLengthCount
Each leaf → each spine (8 parallel)Passive copper DAC≤ 3 m8 × 4 × 8 = 256 DAC cables
Total unique leaf-spine pairs8 leaves × 4 spines32 pairs

SHARP In-Network Computation

  • All 12 Q3400-RA support SHARP Gen 4
  • AllReduce operations partially executed inside switches — partial gradient sums computed cooperatively by leaf and spine
  • Reduces fabric bandwidth consumed per AllReduce iteration by up to ~50% for training batch sizes common in LLM pretraining
  • Orchestrated automatically by UFM Appliance; no application code changes required
  • Particularly impactful for gradient synchronization in large transformer models (GPT, LLaMA scale)

8. Standard-Aligned Ethernet, Management & UFM Layers

Layering Rule

This revision separates the non-compute network stack into four explicit layers:

  1. Retained Spectrum-4 side fabric for the current 32-node BF3 path
  2. SN5610 standard-aligned converged / storage-network layer
  3. SN4700 border and control layer
  4. SN2201 management layer with dual UFM nodes

These are now treated as accepted topology components for the Kedios package. The report therefore stops describing management and border networking as a single generic custom switch and instead presents them as the standard-aligned design basis serving the current 32-node farm.

Spectrum-4 AI Ethernet Fabric (Retained)

The Spectrum-4 pair remains in the topology by explicit design decision. It continues to carry the BF3-facing Ethernet path for the current 32-node farm.

ParameterValue
Count2
Switching capacity51.2 Tb/s per switch
Port configuration128 × 400 GbE (or 64 × 800 GbE)
ProtocolEthernet + RoCE (Spectrum-X AI Ethernet platform)
BF3 port allocation32 BF3 ports per switch
Redundancy modelActive-active — each BF3 DPU connects one port to each Spectrum-4
Inter-switch linkLocal ISL for east-west forwarding symmetry and resiliency

Traffic carried by the retained Spectrum-4 fabric:

  • BF3-based storage and service traffic for the 32 deployed servers
  • secondary compute-adjacent Ethernet traffic
  • orchestration and control-plane services that are intentionally kept off the IB data fabric

SN5610 Converged / Storage-Network Layer

The refreshed standard baseline introduces an SN5610 layer with a locked count of 6 total units, using the owner-confirmed 2 + (2 + 2) formula. In this report revision, the count is locked and the role is locked, while the final sub-role placement is deferred to the updated diagrams.

AttributeWorking position
Count6 total
RoleStandard-aligned converged / storage-network switching layer
Scope ruleIncluded in the current package narrative even though the 32-node farm uses only a subset of the available endpoint capacity
Presentation ruleDo not describe these units as mere future reserve; they are part of the accepted topology basis

UFM High-Availability Pair

The refreshed baseline upgrades UFM from a single-appliance default to a dual-node base architecture.

ParameterValue
Count2
Management connectionVia the management plane, not on the IB data fabric
Managed scope today12× Q3400-RA + 256× CX8 endpoints
Design positionProduction + standby / HA pair

UFM responsibilities remain unchanged in type:

  • topology discovery
  • routing computation
  • adaptive routing and congestion response
  • SHARP orchestration
  • fault detection and recovery
  • telemetry aggregation

The reason for moving to 2 × UFM is operational resilience and standards alignment, not raw port-count exhaustion. One UFM node is still likely sufficient for raw capacity, but it is no longer the preferred base-design statement for the standards-aligned network package.

SN2201 Management Layer

The management plane now moves from a generic 96-port 10 GbE OOB switch narrative to an SN2201-based management layer.

AttributeWorking position
Count17 total
Count formula9 + 8
RoleServer management, BMC / IPMI, switch management, and control-plane fan-out
Replacement ruleReplaces the previous generic single-switch OOB base-design wording

The current management-port mapping rule remains unchanged until final server integration says otherwise:

  • X710 Port 0 = OS management
  • X710 Port 1 = BMC / IPMI path

SN4700 Border / Control Layer

The refreshed baseline also adds a fixed SN4700 border and control layer:

Layer roleCount
Border leaf2
C-spine / OOB-related layer2
Total SN47004

These units should now be treated as part of the IDC-facing and control-layer architecture rather than as optional future add-ons.

Placement Rule

This source revision locks the counts and roles above, but it does not claim a final U-by-U rack drawing for them yet. The next draw.io refresh must convert these locked counts into the final physical placement view.


9. Cabling Plan

Locked Cable Counts From The Current 32-Node Compute Population

LinkCable typeLengthCountNotes
Server CX8 → Q3400-RA leaf (IB NDR800)AOC5–10 m2568 per server × 32 servers
Server BF3 → Spectrum-4 (400 GbE NDR400)AOC5–10 m642 per server × 32 servers
Server X710 management linksCat6A copper5–10 m642 per server × 32 servers
Q3400-RA leaf → Q3400-RA spine (NDR800)Passive DAC≤ 3 m256All intra-network envelope

Standard-Aligned Network-Layer Additions

The refreshed baseline adds cable families that were not present in the old ~654 total cable assemblies roll-up:

  • SN5610 interconnects
  • SN4700 border and control-layer patching
  • SN2201 management fan-out and uplinks
  • second-UFM management links

Those counts should be finalized in the diagram redraw and the refreshed procurement BOM, because the exact values depend on the final physical placement and link map. For that reason:

  • keep the compute-side cable counts above as locked
  • do not reuse the old ~654 total as a final all-in cable total for the refreshed topology

10. Complete Hardware Bill of Materials

Compute Hardware — Current Deployment

ComponentQty per serverCurrent farm total
ASUS XA NB3I-E12 server132
NVIDIA B300 GPU (on HGX tray)8256
Intel Xeon Platinum 6776P CPU264
Samsung DDR5-6400 128 GB RDIMM321,024
Samsung PM9D3a 1.92 TB NVMe U.2264
Samsung PM9D3a 3.84 TB NVMe U.28256
TPM security module132

Split Baseline vs Current Deployment BOM

Component family72-node standard baselineCurrent 32-node populated deploymentNote
Compute nodes7232Standard reference capacity versus live deployment
GPUs576256Same rule as compute nodes
Q3400-RA leaf88Live compute-fabric count remains unchanged
Q3400-RA spine44Live compute-fabric count remains unchanged
Spectrum-422Retained by explicit owner direction
UFM nodes22Production + standby / HA pair
SN5610 total66Locked formula: 2 + (2 + 2)
SN4700 border leaf22Border layer
SN4700 C-spine / OOB22Control / OOB layer
SN2201 management layer1717Locked formula: 9 + 8
UFM Agent (software)7232One software agent per populated compute server

Rack Infrastructure

ItemCountRule
Compute racks (C01–C32)32Fixed current deployment
Contracted package total34 occupied racksFixed project package
Network / services placement envelopeN1–N6 logical envelopePhysical placement refreshed in diagrams rather than assumed from the old 2 occupied + 4 spare model

11. Aggregate Bandwidth Summary

Fabric layerCalculationTotal bandwidth
IB compute — server side256 CX8 × 800 Gb/s204.8 Tb/s
IB compute — spine bisection8 leaf × 32 uplinks × 800 Gb/s204.8 Tb/s (1:1 non-blocking)
NVLink (intra-server, each)NVLink Gen 5, 1.8 TB/s × 8 GPUs14.4 TB/s per server
NVLink (farm aggregate)32 × 14.4 TB/s460.8 TB/s
BF3 side-fabric Ethernet64 BF3 ports × 400 Gb/s25.6 Tb/s
Spectrum-4 switching capacity2 × 51.2 Tb/s102.4 Tb/s
Management / border / storage-network layersTopology-specific in this revisionPresented by count and role, not collapsed into one aggregate number

12. Farm Compute Performance

MetricPer serverFarm total (×32)
FP8 dense (GPU peak)~36 PFLOPS~1,152 PFLOPS
NVFP4 sparse (GPU peak)~240 PFLOPS~7,680 PFLOPS
GPU memory2.304 TB HBM3e73.73 TB HBM3e
HBM3e bandwidth~8 TB/s per server~256 TB/s farm
System RAM4 TB DDR5128 TB
NVMe storage34.56 TB~1,106 TB
IB network bandwidth6.4 Tb/s (8 × 800 Gb/s)204.8 Tb/s bisection
Ethernet BW (BF3)800 Gb/s (2 × 400 Gb/s)25.6 Tb/s

13. Topology Diagrams

Three rendered SVG topology views are published through the shared diagram viewer. Each one opens in a centered review window with fit, 100%, zoom, and scroll controls for detailed inspection.

Diagram 1 — IDC Farm Planning & Connectivity

Open rendered SVG viewer

Block-flow diagram showing the full end-to-end connectivity path:

Internet → Router → Firewall → Border Leaf → Spectrum-4 (active-active) → Q3400-RA IB Fabric → 32× B300 Compute Servers

Also shows: OOB Management Switch, UFM Appliance, Management networking path, zone separation, fabric management connections.


Diagram 2 — Farm InfiniBand Topology (NVIDIA-Style Rail View)

Open rendered SVG viewer

Full IB topology in the NVIDIA reference style:

  • Row 1: 4× Q3400-RA Spine switches (S0–S3)
  • Row 2: 8× Q3400-RA Leaf switches (L0–L7), each rail color-coded
  • Row 3: 32× server nodes (C01–C32), grouped in 4 sets of 8
  • 256 rail-colored CX8→Leaf connections (8 colors, one per rail)
  • 32 orange spine-to-leaf connections
  • Ethernet section: Spectrum-4 #1/#2, UFM, OOB, Storage (future)
  • 64 BF3→Spectrum-4 connections

Diagram 3 — Single Rack Leaf/Spine Connection Detail

Open rendered SVG viewer

Per-port detail for a single representative server rack (C01):

  • All 8 CX8 ports to 8 leaf switches (color-coded by rail)
  • All 8 leaf switches to 4 spine switches (orange, 8×4 = 32 links)
  • BF3-0 → Spectrum-4 #1 (purple AOC)
  • BF3-1 → Spectrum-4 #2 (purple AOC)
  • X710 Port 0 + Port 1 → OOB Management Switch (yellow dashed)
  • UFM Agent (SW) ← UFM Appliance (blue dashed)
  • UFM Appliance → Leaf switches for IB fabric management (blue dashed)
  • Connection legend (color key for all edge types)

Racks C02–C32 are identical — same 12-port topology, same cable types, same switch targets.


14. Key Design Decisions & Rationale

DecisionChoiceRationale
Server per rack1 server per rack (9U, ~14.5 kW sustained)20 kW/rack hard limit still governs the package
Compute population32 deployed nodesCustomer scope remains 32 nodes / 256 GPUs
Network baseline72-node-standard-alignedRequired for BOM, management, border, and OOB framing
IB switch modelQ3400-RA (8 leaf + 4 spine)Live compute fabric remains rail-optimized and 1:1 non-blocking
Spectrum-4 roleRetained active-active pairExplicit owner direction to keep the side fabric in place
UFM architecture2 nodes + per-server software agentsStandards alignment and HA posture outweigh the old single-node default
Management layerSN2201-basedReplaces the old single generic OOB-switch story
Border / control layerSN4700-basedRequired to reflect the new standard-aligned network baseline
Storage-network layerSN5610-basedRequired to align the package with the accepted standard reference pattern
BOM presentationSplit baseline vs current deploymentPrevents accidental implication that 72 compute nodes are already live
Power narrative1.5 MW facility envelope + 680 kW rack ceilingSeparates facility allocation from rack-constrained usable envelope
Physical placement ruleRefresh via draw.ioCounts and roles are locked first; final placement follows in the updated diagrams

15. Risk & Mitigation Notes

RiskSeverityMitigation
GPU burst overshoot trips breakerMedium4.6 kW burst margin remains at the 20 kW/rack ceiling; retain monitoring and BMC power controls
Single AOC cable failure (server→leaf)MediumPer-rail isolation limits blast radius; UFM rerouting and job checkpointing remain the primary control
Leaf switch failureHighRail isolation plus UFM rerouting still apply
Spectrum-4 failureLowActive-active pair retained; surviving switch preserves service at reduced aggregate Ethernet bandwidth
Single UFM-node failureLowBase design is now a dual-node UFM architecture
Documentation ambiguity between 32 deployed and 72 standard baselineHighEvery externally facing table must label standard baseline and current deployment explicitly
Refreshed network-side power or cable roll-up published too earlyMediumDo not claim a new all-in network watt or all-cable total until the refreshed BOM and draw.io topology lock the missing entries
Future scale-out language reuses the old ~64-node storyMediumReframe future growth against the 72-node-standard network baseline while keeping the live compute population at 32

Report generated: February 28, 2026

Architecture status: Locked

Diagrams: ✅ farm-idc-topology · ✅ farm-ib-topology · ✅ single-rack-topology

Next actions: Commit to repository · Stakeholder review · Procurement kick-off

Glossary

NDR
Next Data Rate — InfiniBand generation at 400 Gb/s (NDR400) or 800 Gb/s (NDR800) per physical port.
NDR400
InfiniBand NDR at 400 Gb/s per port, used by the BlueField-3 DPU for side-fabric connections.
NDR800
InfiniBand NDR at 800 Gb/s per port, used by ConnectX-8 HCAs on the HGX B300 GPU-to-fabric links.
ConnectX-8
NVIDIA ConnectX-8 NDR800 InfiniBand HCA integrated on the HGX B300 tray — 8 per server, one per GPU rail.
BlueField-3
NVIDIA BF-3220 DPU — 400G NDR400 InfiniBand, provides side-fabric connectivity and in-network compute offload.
Q3400-RA
NVIDIA Quantum-X800 Q3400 Rail-Accelerated InfiniBand switch — 144 NDR ports; deployed as 8 leaf + 4 spine.
Spectrum-4
NVIDIA Spectrum-4 400GbE/InfiniBand Ethernet switch — 51.2 Tb/s; retained as active-active side-fabric pair.
SN5610
NVIDIA Spectrum-SN5610 converged 400G Ethernet switch — 6 units in the storage/converged service plane.
SN4700
NVIDIA Spectrum-SN4700 400G Ethernet switch — 4 units for border/WAN handoff and control-plane.
SN2201
NVIDIA Spectrum-SN2201 1G/10G management switch — 17 units covering the full OOB management layer.
UFM
Unified Fabric Manager — NVIDIA IB fabric management; deployed as 2-node HA pair (production + standby).
SHARP
Scalable Hierarchical Aggregation and Reduction Protocol — in-network collective offload on Q3400-RA.
HGX B300
NVIDIA HGX Blackwell Ultra B300 — 8-GPU tray with NVLink Gen 5 at 1.8 TB/s per GPU, 14.4 TB/s aggregate across the tray.
B300 GPU
NVIDIA Blackwell Ultra B300 — 288 GB HBM3e, 1.1 kW TDP; current report basis uses ~4.5 PFLOPS FP8 dense / ~9 PFLOPS FP8 sparse and ~15 PFLOPS NVFP4 dense / ~30 PFLOPS NVFP4 sparse per GPU.
NVLink
NVIDIA direct GPU interconnect — Gen 5 on Blackwell at 1.8 TB/s per GPU, yielding 14.4 TB/s across an 8-GPU HGX B300 tray.
HBM3e
High Bandwidth Memory 3e — stacked DRAM in B300 GPUs at 288 GB per GPU, 8 TB/s peak bandwidth.
Fat-Tree
Network topology providing non-blocking bisection bandwidth; IB compute fabric is a 2-tier rail-optimised fat-tree.
Rail-Optimised
IB fabric layout: each GPU rail maps to a dedicated leaf switch, keeping AllReduce traffic rail-local.
AOC
Active Optical Cable — fibre-based cable with integrated E/O conversion, used for all NDR800 IB inter-rack links.
IPMI / BMC
Intelligent Platform Management Interface / Baseboard Management Controller — out-of-band server management.
PDU-A / PDU-B
Dual-feed power distribution: each PSU bank pairs with one PDU, giving N+5 PSU + dual-feed facility redundancy.
CRAC / CRAH
Computer Room Air Conditioner / Air Handler — precision cooling units, N+1 target coverage in the Kedios facility.
DPU
Data Processing Unit — BlueField-3 Smart NIC providing network/storage offload and security isolation.
XA NB3I-E12
ASUS server SKU: 9U air-cooled, dual Xeon 6776P, 32 × 128 GB DDR5 (4 TB total), 10× NVMe, HGX B300 ×8, CX-8 ×8, BF-3 ×2.
Xeon 6776P
Intel Xeon 6 Granite Rapids-SP — 56-core, PCIe 5.0 host CPU in the XA NB3I-E12 server; current server power tables in this repo model ~350 W per socket.
NVFP4
NVIDIA FP4 format — current report basis uses ~15 PFLOPS dense / ~30 PFLOPS sparse per B300 GPU, reported in this repo as ~240 PFLOPS sparse per 8-GPU server.
FP8
8-bit float — current report basis uses ~4.5 PFLOPS dense / ~9 PFLOPS sparse per B300 GPU, with the report itself citing ~36 PFLOPS dense per 8-GPU server.
AllReduce
Distributed-training collective operation across all GPUs; accelerated by IB fat-tree fabric and SHARP.
Fat-Tree Bisection BW
204.8 Tb/s across the full 32-server farm — 1:1 non-blocking, no fabric oversubscription.
20 kW Rack Limit
Hard power cap per rack in the Kedios facility; servers draw ~14.5 kW sustained, leaving 5.5 kW margin.