Kedios Infrastructure Report
Kedios B300 — Full Network & Topology Report
Kedios B300 — Full Network & Topology Report
Organization: Kedios
Date: March 30, 2026
Scope: 32-Node NVIDIA Blackwell Ultra B300 GPU Farm with 72-Node-Standard-Aligned Network Baseline
Status: Source Architecture Refresh Locked · Diagram Redraw Pending
1. Executive Summary
The Kedios B300 package remains a 32-node NVIDIA Blackwell Ultra B300 GPU training cluster inside a 20 MW facility, but the network architecture, BOM framing, and management-plane narrative are now refreshed to align with the 72-node B300 standard baseline. The compute population stays fixed at 32 servers / 256 GPUs. The standards-aligned layer now governs how the report presents InfiniBand control, Ethernet side fabric, border, storage-network, and management components.
This means the report intentionally carries two truths at once:
- the current deployed compute scope is still a 32-node farm
- the accepted network baseline is now a 72-node-standard-aligned design pattern
The compute fabric remains a 1:1 non-blocking Q3400-RA InfiniBand fat-tree delivering 204.8 Tb/s of server-side bisection bandwidth for the currently deployed 32-node farm. At the same time, the Ethernet, storage-network, management, and border layers are no longer described as a minimal custom add-on. They are now expressed as the standard-aligned design basis for the current package.
The platform remains optimized for collective-heavy distributed training and post-training workflows. Selective inference and checkpoint-adjacent workloads remain supportable, but they are not the sizing basis for the fabric or rack-power architecture.
Farm at a Glance
| Parameter | Value |
|---|---|
| Server model | ASUS XA NB3I-E12 (HGX Blackwell Ultra B300, 9U air-cooled) |
| Current deployed servers | 32 |
| Current deployed GPUs | 256 × NVIDIA B300 |
| Standard network reference capacity | 72 nodes / 576 GPUs |
| Total GPU memory (current deployment) | 73.73 TB HBM3e (256 × 288 GB) |
| Peak compute (FP8 dense, current deployment) | ~1,152 PFLOPS |
| Peak compute (NVFP4 sparse, current deployment) | ~7,680 PFLOPS |
| IB fabric bisection BW | 204.8 Tb/s (1:1 non-blocking) |
| IB fabric core | 8 × Q3400-RA leaf + 4 × Q3400-RA spine |
| Retained Ethernet side fabric | 2 × NVIDIA Spectrum-4 (active-active) |
| Standard-aligned added layers | SN5610 ×6, SN4700 ×4, SN2201 ×17, UFM ×2 |
| Contracted rack package | 34 occupied racks |
| Per-rack hard limit | 20 kW |
| Current 34-rack hard ceiling | 680 kW |
| Facility allocation | 1.5 MW |
The earlier ~530 kW all-in peak remains the last fully validated package-level figure from the pre-refresh network stack. Because the refreshed baseline adds
SN5610,SN4700,SN2201, and a secondUFMnode, this revision separates validated compute-side power anchors from the refreshed network-side BOM instead of overstating a new all-in watt total before the final vendor power sheets are attached.
2. Data Center Infrastructure & Power
Facility Overview
| Parameter | Value |
|---|---|
| Total DC power capacity | 20 MW |
| Kedios allocation | 1.5 MW (1,500 kW) |
| Per-cabinet hard limit | 20 kW (wall-output hard ceiling) |
| Current 34-rack hard ceiling | 680 kW |
| Building levels | Multi-level (Level 1, 2, 3) |
| Cooling units | CRAC / CRAH fleet with N+1 target coverage |
| Power distribution | PDU-A and PDU-B (dual-feed per row) |
| UPS | UPS Room #1, #2, #3 + UPS 3 & UPS 4 |
| Power control | PCU 1, PCU 2 |
Per-Server Power Audit (Component-Level)
| Component | Model / Spec | Qty | Power each | Subtotal |
|---|---|---|---|---|
| NVIDIA B300 GPU | HGX tray, TDP 1,100 W | 8 | 1,100 W | 8,800 W |
| Intel Xeon 6776P CPU | 56-core, 660 W TDP | 2 | 350 W | 700 W |
| DDR5-6400 128 GB RDIMM | Samsung M321RAJA0MB2-CCP | 32 | ~7 W | 224 W |
| Samsung PM9D3a NVMe U.2 | Gen5, 1.92/3.84 TB | 10 | ~10 W | 100 W |
| ConnectX-8 NIC (HGX on-board) | 800 Gb/s NDR | 8 | ~20 W | 160 W |
| BlueField-3 3220 DPU | 400 Gb/s NDR400 | 2 | ~40 W | 80 W |
| Intel X710-AT2 mgmt NIC | Dual 10 GbE | 1 | ~12 W | 12 W |
| System fans (80×80 mm) | High-RPM | 15 | ~18 W | 270 W |
| CPU fans (60×56 mm) | — | 6 | ~7 W | 42 W |
| PCIe switch PEX89144 | — | 1 | ~12 W | 12 W |
| VRMs / motherboard / misc | — | — | — | 175 W |
| Component sum | 10,575 W | |||
| PSU loss (80 PLUS Titanium ~95.5%) | +4.5% | +494 W | ||
| Server draw at wall (sustained) | ~14,500 W ≈ 14.5 kW | |||
| In-rack PDU cabling loss (~1%) | +111 W | |||
| Total server rack draw — sustained peak | ~14.5 kW | |||
| GPU instantaneous burst overshoot (~+6%) | ~15.4 kW absolute max |
Server Rack vs 20 kW Hard Limit
| Item | Value |
|---|---|
| 20 kW hard limit per rack | 20.0 kW |
| Server sustained peak (at wall, incl. PDU loss) | ~14.5 kW |
| Absolute instantaneous burst | ~15.4 kW |
| Safety margin (sustained) | ~5.5 kW ✅ |
| Safety margin (burst) | ~4.6 kW ✅ |
Server fits within the 20 kW hard limit. Sustained training peak of ~14.5 kW leaves 5.5 kW margin. Even at absolute burst (~15.4 kW) the margin is 4.6 kW — no risk of breaker trips.
Validated Power Anchors
| Layer | Basis | Value |
|---|---|---|
| 32 compute racks — sustained | 32 × ~14.5 kW | ~464 kW |
| 32 compute racks — burst peak | 32 × ~15.4 kW | ~492 kW |
| Compute-rack hard ceiling | 32 × 20 kW | 640 kW |
| Current 34-rack package hard ceiling | 34 × 20 kW | 680 kW |
| Facility allocation envelope | Project allocation | 1.5 MW |
Network-Side Power Refresh Status
The refreshed network baseline adds hardware families that were not part of the earlier all-in package roll-up. The current source set supports counts, roles, and topology placement intent for these elements, but it does not yet support a final consolidated watt total for every added model.
| Component family | Count now in scope | Power status in this source revision |
|---|---|---|
| Q3400-RA | 12 | Existing repo-backed estimate remains available |
| Spectrum-4 | 2 | Existing repo-backed estimate remains available |
| UFM | 2 | Count locked; refreshed dual-node roll-up pending final BOM watt sheet |
| SN5610 | 6 | Count locked; per-device wattage pending final model sheet |
| SN4700 | 4 | Count locked; per-device wattage pending final model sheet |
| SN2201 | 17 | Count locked; per-device wattage pending final model sheet |
Use 1.5 MW as the facility-allocation answer and 680 kW as the current rack-constrained ceiling under the unchanged 20 kW/rack rule. Do not promote a new all-network aggregate watt number until the refreshed procurement BOM carries the missing per-model power entries.
3. Physical Layout & Zone Structure
Zone Allocation
| Zone | Current rule |
|---|---|
| 32-rack compute zone | Unchanged — 32× ASUS XA NB3I-E12 B300 servers (C01–C32) |
| Network / services envelope | The old N1 occupied + N2 occupied + N3–N6 empty description is no longer authoritative. The refreshed topology now uses the broader network/services placement envelope for IB, Spectrum-4, SN5610, SN4700, SN2201, and dual-UFM layers. |
The project still remains a 34-rack package. The updated draw.io sources must finalize the physical placement of the refreshed network layers instead of reusing the previous assumption that only
N1andN2are populated.
Compute Zone — 32-Rack Grid (C01–C32)
- Racks numbered C01 through C32 — pure compute, 1 server per rack
- Alternating cold aisle / hot aisle layout — conditioned air from CRAC #1 and CRAC #2
- Dual-feed from PDU-A and PDU-B across all rows (N+1 power path redundancy)
- Each rack: server occupies bottom U1–U9 (9U); patch panel at U11; cable management U12–U13; upper 29U intentionally empty
- Each rack exits 12 cables: 8× IB AOC + 2× Ethernet AOC + 2× Cat6A copper to network tower
Network Zone — Standard-Aligned Placement Envelope
The compute-side physical rule still holds: long server-to-network runs cross the inter-zone gap and therefore use AOC or copper management cabling rather than passive DAC. What changes in this revision is the network-layer occupancy narrative:
- the Q3400-RA core remains the live compute-fabric anchor
- the Spectrum-4 pair remains the live BF3 side fabric
- the management, border, and storage-network layers are now part of the accepted standard-aligned design basis
- the prior statement that
N3–N6are simply empty spare racks is withdrawn
Until the diagram refresh is completed, treat N1–N6 as the logical network/services placement envelope rather than as a final U-by-U rack map.
4. Server Specification — ASUS XA NB3I-E12
Model: ASUS XA NB3I-E12
Form factor: 9U, air-cooled (front-to-back), direct airflow
Quantity deployed: 32 units (one per rack, C01–C32)
Per-Server Hardware Bill of Materials
| Component | Model / Spec | Qty per server |
|---|---|---|
| GPU | NVIDIA Blackwell Ultra B300 (HGX tray), TDP 1,100 W each | 8 |
| GPU memory | 288 GB HBM3e per GPU (12-high HBM3e stacks) = 2.304 TB per server | — |
| CPU | Intel Xeon Platinum 6776P (56 cores, 660 W TDP) | 2 |
| System RAM | Samsung M321RAJA0MB2-CCP 128 GB DDR5-6400 RDIMM | 32 (= 4 TB) |
| Boot SSD | Samsung PM9D3a U.2 Gen5 NVMe 1.92 TB | 2 |
| Data SSD | Samsung PM9D3a U.2 Gen5 NVMe 3.84 TB | 8 |
| IB NIC (GPU fabric) | ConnectX-8 (CX8) soldered on HGX tray, 800 Gb/s NDR IB | 8 |
| DPU | NVIDIA BlueField-3 3220, 400 Gb/s NDR400 Ethernet | 2 |
| Management NIC | Intel X710-AT2, dual-port 10 GbE RJ45 | 1 |
| UFM Agent | Software only — installed on OS | 1 (SW only) |
| Interconnect fabric | NVLink 5 (intra-GPU), 14.4 TB/s total | HGX tray |
| PSU | 80 PLUS Titanium, 5+5 array (N+5 redundancy) | 10 |
Per-Server Performance
| Metric | Value |
|---|---|
| GPU compute (FP8 dense, 8-GPU) | ~36 PFLOPS |
| GPU compute (NVFP4 sparse, 8-GPU) | ~240 PFLOPS |
| Intra-server NVLink BW | 14.4 TB/s (NVLink Gen 5, 1.8 TB/s × 8 GPUs) |
| GPU-to-GPU latency (intra-server) | < 100 ns |
| System memory bandwidth | ~600 GB/s |
| Total NVMe storage | 34.56 TB (2 × 1.92 TB boot + 8 × 3.84 TB data) |
| Operating power draw | ~14.5 kW |
| Peak draw at wall (sustained) | ~14.5 kW |
| Absolute instantaneous burst | ~15.4 kW |
32-Server Farm Aggregate
| Metric | Value |
|---|---|
| Total GPUs | 256 × NVIDIA B300 |
| Total GPU memory | 73.73 TB HBM3e |
| Total system RAM | 128 TB DDR5 |
| Total NVMe storage | ~1,106 TB |
| Peak compute (FP8 dense) | ~1,152 PFLOPS |
| Peak compute (NVFP4 sparse) | ~7,680 PFLOPS |
| Intra-node NVLink aggregate | 32 × 14.4 TB/s = 460.8 TB/s |
| Operating power | ~355 kW |
| Peak power (sustained wall) | ~464 kW |
| Absolute burst peak | ~492 kW |
| Thermal output per server | ~14.5 kW = ~49,500 BTU/hr |
| Farm thermal output | ~355 kW = ~1,211,000 BTU/hr |
5. Network Architecture — InfiniBand Compute Fabric
Network Ports Per Server
| Port type | Component | Count | Speed | Fabric |
|---|---|---|---|---|
| CX8 IB (GPU fabric) | HGX tray on-board | 8 | 800 Gb/s NDR | IB compute fabric → Q3400-RA leaf |
| BlueField-3 DPU | Add-in card | 2 | 400 Gb/s NDR400 | Storage/security Ethernet → Spectrum-4 |
| Intel X710-AT2 | Management NIC | 2 | 10 GbE | OOB management → OOB switch |
| Total per server | 12 |
32-Server Farm Port Totals
| Fabric | Per server | × 32 servers | Speed | Target layer |
|---|---|---|---|---|
| CX8 IB (compute) | 8 | 256 ports | 800 Gb/s NDR | Q3400-RA leaf (×8) |
| BF3 Ethernet (storage/DPU) | 2 | 64 ports | 400 Gb/s NDR400 | Spectrum-4 pair and standard-aligned Ethernet layer |
| Server management | 2 | 64 server-facing ports | 10 GbE | SN2201-based management layer |
The old
80 total ports on one generic OOB switchfigure is retired in this revision. Management, border, and control-plane connectivity now follow the standard-alignedSN2201 + SN4700 + dual-UFMmodel rather than a single generic 96-port switch story.
InfiniBand Rail Assignment
Each server's 8 CX8 ports are assigned to 8 independent GPU rails:
| CX8 port | Rail | Leaf switch | GPU |
|---|---|---|---|
| CX8[0] | Rail 0 | Leaf L0 | GPU-0 |
| CX8[1] | Rail 1 | Leaf L1 | GPU-1 |
| CX8[2] | Rail 2 | Leaf L2 | GPU-2 |
| CX8[3] | Rail 3 | Leaf L3 | GPU-3 |
| CX8[4] | Rail 4 | Leaf L4 | GPU-4 |
| CX8[5] | Rail 5 | Leaf L5 | GPU-5 |
| CX8[6] | Rail 6 | Leaf L6 | GPU-6 |
| CX8[7] | Rail 7 | Leaf L7 | GPU-7 |
Rail isolation principle: GPU-i on server X and GPU-i on server Y share the same leaf switch. This means AllReduce within a rail (the dominant training communication pattern) traverses zero spine hops — leaf-only latency. Only cross-rail operations (AllToAll) require the 2-hop leaf→spine→leaf path.
6. Switch Topology — 2-Tier Rail-Optimized Fat-Tree
Topology Design
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
SPINES (×4) │ Spine │ │ Spine │ │ Spine │ │ Spine │
│ S0 │ │ S1 │ │ S2 │ │ S3 │
│Q3400-RA │ │Q3400-RA │ │Q3400-RA │ │Q3400-RA │
│144p NDR │ │144p NDR │ │144p NDR │ │144p NDR │
└──┬───────┘ └──┬───────┘ └──┬───────┘ └──┬───────┘
│ (32 links per spine, 8 per leaf) │
┌──────┬────────┼──────┬─────┬──────┬────┼────┬────────┼──────┐
│ │ │ │ │ │ │ │ │ │
┌─────┴──┐ ┌─┴──────┐ ... ┌──┴─────┴──┐ ... ┌─┴────┴──┐ ┌──┴─────┐
│ Leaf L0│ │ Leaf L1│ │ Leaf L6 │ │ Leaf L7│ │ │
│Rail 0 │ │Rail 1 │ │ Rail 6 │ │ Rail 7 │ │ │
│Q3400-RA│ │Q3400-RA│ │ Q3400-RA │ │Q3400-RA │ │ │
└────┬───┘ └────┬───┘ └─────┬─────┘ └────┬────┘ └────────┘
│ │ │ │
(32 downlinks per leaf — one CX8[n] port from each server)
│ │ │ │
C01..C32 C01..C32 C01..C32 C01..C32
CX8[0] CX8[1] CX8[6] CX8[7]
Q3400-RA Specifications
| Parameter | Value |
|---|---|
| Form factor | 4U chassis |
| Port count | 144 × 800 Gb/s NDR IB (72 OSFP connectors, each bifurcated) |
| Switching capacity | 115.2 Tb/s non-blocking |
| UFM management port | Dedicated in-band OSFP 400 Gb/s (isolated from data ports) |
| Adaptive features | SHARP Gen 4, adaptive routing, telemetry congestion control |
| SerDes | 200 Gb/s |
| RDMA support | Yes |
Leaf Switch Design (×8 — L0 through L7)
| Parameter | Value |
|---|---|
| Count | 8 leaf switches |
| Each serves | 1 GPU rail |
| Server downlinks | 32 (one CX8 port from each server on that rail) |
| Spine uplinks | 32 (8 parallel links to each of 4 spines) |
| Port utilization | 64 of 144 (44%) — room for ~32 more servers (up to ~64 total) |
| Cable type (down) | AOC 5–10 m (server → leaf, cross-zone) |
| Cable type (up) | DAC ≤3 m (leaf → spine, intra-N1) |
Spine Switch Design (×4 — S0 through S3)
| Parameter | Value |
|---|---|
| Count | 4 spine switches |
| Leaf-facing ports | 64 (8 parallel uplinks × 8 leaf switches) |
| Server connections | None (pure leaf-to-leaf forwarding plane) |
| Port utilization | 64 of 144 (44%) |
| Cable type | DAC ≤3 m (all intra-N1) |
Fat-Tree Bandwidth & Latency
| Metric | Value |
|---|---|
| Server-side aggregate BW | 256 × 800 Gb/s = 204.8 Tb/s |
| Spine bisection BW | 8 leaf × 32 uplinks × 800 Gb/s = 204.8 Tb/s |
| Bisection ratio | 1:1 — fully non-blocking |
| Max hops server-to-server | 2 (leaf → spine → leaf) |
| IB latency server-to-server | < 200 ns (2-hop NDR) |
| Intra-rail latency (same leaf) | < 100 ns (single hop) |
| SHARP Gen 4 AllReduce gain | Up to ~50% BW reduction for typical batch sizes |
The 8-leaf / 4-spine Q3400 topology remains the live compute fabric for the current 32-node farm. The 72-node standard alignment changes the surrounding Ethernet, management, border, and storage-network layers rather than the rail mapping or the 32-node compute-population count.
7. Network Tower — Rack N1 (InfiniBand Fabric)
Physical Content — 12 × Q3400-RA (48U total)
All 12 Q3400-RA switches are co-located in the network tower (Rack N1, or split across N1 IB bay and N2 upper bay as needed for physical depth). Co-location enables all inter-switch links to be short passive DAC, eliminating active components from the spine-to-leaf path.
Leaf-to-Spine Internal Wiring
| Link | Cable type | Length | Count |
|---|---|---|---|
| Each leaf → each spine (8 parallel) | Passive copper DAC | ≤ 3 m | 8 × 4 × 8 = 256 DAC cables |
| Total unique leaf-spine pairs | 8 leaves × 4 spines | — | 32 pairs |
SHARP In-Network Computation
- All 12 Q3400-RA support SHARP Gen 4
- AllReduce operations partially executed inside switches — partial gradient sums computed cooperatively by leaf and spine
- Reduces fabric bandwidth consumed per AllReduce iteration by up to ~50% for training batch sizes common in LLM pretraining
- Orchestrated automatically by UFM Appliance; no application code changes required
- Particularly impactful for gradient synchronization in large transformer models (GPT, LLaMA scale)
8. Standard-Aligned Ethernet, Management & UFM Layers
Layering Rule
This revision separates the non-compute network stack into four explicit layers:
- Retained Spectrum-4 side fabric for the current 32-node BF3 path
- SN5610 standard-aligned converged / storage-network layer
- SN4700 border and control layer
- SN2201 management layer with dual UFM nodes
These are now treated as accepted topology components for the Kedios package. The report therefore stops describing management and border networking as a single generic custom switch and instead presents them as the standard-aligned design basis serving the current 32-node farm.
Spectrum-4 AI Ethernet Fabric (Retained)
The Spectrum-4 pair remains in the topology by explicit design decision. It continues to carry the BF3-facing Ethernet path for the current 32-node farm.
| Parameter | Value |
|---|---|
| Count | 2 |
| Switching capacity | 51.2 Tb/s per switch |
| Port configuration | 128 × 400 GbE (or 64 × 800 GbE) |
| Protocol | Ethernet + RoCE (Spectrum-X AI Ethernet platform) |
| BF3 port allocation | 32 BF3 ports per switch |
| Redundancy model | Active-active — each BF3 DPU connects one port to each Spectrum-4 |
| Inter-switch link | Local ISL for east-west forwarding symmetry and resiliency |
Traffic carried by the retained Spectrum-4 fabric:
- BF3-based storage and service traffic for the 32 deployed servers
- secondary compute-adjacent Ethernet traffic
- orchestration and control-plane services that are intentionally kept off the IB data fabric
SN5610 Converged / Storage-Network Layer
The refreshed standard baseline introduces an SN5610 layer with a locked count of 6 total units, using the owner-confirmed 2 + (2 + 2) formula. In this report revision, the count is locked and the role is locked, while the final sub-role placement is deferred to the updated diagrams.
| Attribute | Working position |
|---|---|
| Count | 6 total |
| Role | Standard-aligned converged / storage-network switching layer |
| Scope rule | Included in the current package narrative even though the 32-node farm uses only a subset of the available endpoint capacity |
| Presentation rule | Do not describe these units as mere future reserve; they are part of the accepted topology basis |
UFM High-Availability Pair
The refreshed baseline upgrades UFM from a single-appliance default to a dual-node base architecture.
| Parameter | Value |
|---|---|
| Count | 2 |
| Management connection | Via the management plane, not on the IB data fabric |
| Managed scope today | 12× Q3400-RA + 256× CX8 endpoints |
| Design position | Production + standby / HA pair |
UFM responsibilities remain unchanged in type:
- topology discovery
- routing computation
- adaptive routing and congestion response
- SHARP orchestration
- fault detection and recovery
- telemetry aggregation
The reason for moving to 2 × UFM is operational resilience and standards alignment, not raw port-count exhaustion. One UFM node is still likely sufficient for raw capacity, but it is no longer the preferred base-design statement for the standards-aligned network package.
SN2201 Management Layer
The management plane now moves from a generic 96-port 10 GbE OOB switch narrative to an SN2201-based management layer.
| Attribute | Working position |
|---|---|
| Count | 17 total |
| Count formula | 9 + 8 |
| Role | Server management, BMC / IPMI, switch management, and control-plane fan-out |
| Replacement rule | Replaces the previous generic single-switch OOB base-design wording |
The current management-port mapping rule remains unchanged until final server integration says otherwise:
- X710 Port 0 = OS management
- X710 Port 1 = BMC / IPMI path
SN4700 Border / Control Layer
The refreshed baseline also adds a fixed SN4700 border and control layer:
| Layer role | Count |
|---|---|
| Border leaf | 2 |
| C-spine / OOB-related layer | 2 |
| Total SN4700 | 4 |
These units should now be treated as part of the IDC-facing and control-layer architecture rather than as optional future add-ons.
Placement Rule
This source revision locks the counts and roles above, but it does not claim a final U-by-U rack drawing for them yet. The next draw.io refresh must convert these locked counts into the final physical placement view.
9. Cabling Plan
Locked Cable Counts From The Current 32-Node Compute Population
| Link | Cable type | Length | Count | Notes |
|---|---|---|---|---|
| Server CX8 → Q3400-RA leaf (IB NDR800) | AOC | 5–10 m | 256 | 8 per server × 32 servers |
| Server BF3 → Spectrum-4 (400 GbE NDR400) | AOC | 5–10 m | 64 | 2 per server × 32 servers |
| Server X710 management links | Cat6A copper | 5–10 m | 64 | 2 per server × 32 servers |
| Q3400-RA leaf → Q3400-RA spine (NDR800) | Passive DAC | ≤ 3 m | 256 | All intra-network envelope |
Standard-Aligned Network-Layer Additions
The refreshed baseline adds cable families that were not present in the old ~654 total cable assemblies roll-up:
- SN5610 interconnects
- SN4700 border and control-layer patching
- SN2201 management fan-out and uplinks
- second-UFM management links
Those counts should be finalized in the diagram redraw and the refreshed procurement BOM, because the exact values depend on the final physical placement and link map. For that reason:
- keep the compute-side cable counts above as locked
- do not reuse the old
~654total as a final all-in cable total for the refreshed topology
10. Complete Hardware Bill of Materials
Compute Hardware — Current Deployment
| Component | Qty per server | Current farm total |
|---|---|---|
| ASUS XA NB3I-E12 server | 1 | 32 |
| NVIDIA B300 GPU (on HGX tray) | 8 | 256 |
| Intel Xeon Platinum 6776P CPU | 2 | 64 |
| Samsung DDR5-6400 128 GB RDIMM | 32 | 1,024 |
| Samsung PM9D3a 1.92 TB NVMe U.2 | 2 | 64 |
| Samsung PM9D3a 3.84 TB NVMe U.2 | 8 | 256 |
| TPM security module | 1 | 32 |
Split Baseline vs Current Deployment BOM
| Component family | 72-node standard baseline | Current 32-node populated deployment | Note |
|---|---|---|---|
| Compute nodes | 72 | 32 | Standard reference capacity versus live deployment |
| GPUs | 576 | 256 | Same rule as compute nodes |
| Q3400-RA leaf | 8 | 8 | Live compute-fabric count remains unchanged |
| Q3400-RA spine | 4 | 4 | Live compute-fabric count remains unchanged |
| Spectrum-4 | 2 | 2 | Retained by explicit owner direction |
| UFM nodes | 2 | 2 | Production + standby / HA pair |
| SN5610 total | 6 | 6 | Locked formula: 2 + (2 + 2) |
| SN4700 border leaf | 2 | 2 | Border layer |
| SN4700 C-spine / OOB | 2 | 2 | Control / OOB layer |
| SN2201 management layer | 17 | 17 | Locked formula: 9 + 8 |
| UFM Agent (software) | 72 | 32 | One software agent per populated compute server |
Rack Infrastructure
| Item | Count | Rule |
|---|---|---|
| Compute racks (C01–C32) | 32 | Fixed current deployment |
| Contracted package total | 34 occupied racks | Fixed project package |
| Network / services placement envelope | N1–N6 logical envelope | Physical placement refreshed in diagrams rather than assumed from the old 2 occupied + 4 spare model |
11. Aggregate Bandwidth Summary
| Fabric layer | Calculation | Total bandwidth |
|---|---|---|
| IB compute — server side | 256 CX8 × 800 Gb/s | 204.8 Tb/s |
| IB compute — spine bisection | 8 leaf × 32 uplinks × 800 Gb/s | 204.8 Tb/s (1:1 non-blocking) |
| NVLink (intra-server, each) | NVLink Gen 5, 1.8 TB/s × 8 GPUs | 14.4 TB/s per server |
| NVLink (farm aggregate) | 32 × 14.4 TB/s | 460.8 TB/s |
| BF3 side-fabric Ethernet | 64 BF3 ports × 400 Gb/s | 25.6 Tb/s |
| Spectrum-4 switching capacity | 2 × 51.2 Tb/s | 102.4 Tb/s |
| Management / border / storage-network layers | Topology-specific in this revision | Presented by count and role, not collapsed into one aggregate number |
12. Farm Compute Performance
| Metric | Per server | Farm total (×32) |
|---|---|---|
| FP8 dense (GPU peak) | ~36 PFLOPS | ~1,152 PFLOPS |
| NVFP4 sparse (GPU peak) | ~240 PFLOPS | ~7,680 PFLOPS |
| GPU memory | 2.304 TB HBM3e | 73.73 TB HBM3e |
| HBM3e bandwidth | ~8 TB/s per server | ~256 TB/s farm |
| System RAM | 4 TB DDR5 | 128 TB |
| NVMe storage | 34.56 TB | ~1,106 TB |
| IB network bandwidth | 6.4 Tb/s (8 × 800 Gb/s) | 204.8 Tb/s bisection |
| Ethernet BW (BF3) | 800 Gb/s (2 × 400 Gb/s) | 25.6 Tb/s |
13. Topology Diagrams
Three rendered SVG topology views are published through the shared diagram viewer. Each one opens in a centered review window with fit, 100%, zoom, and scroll controls for detailed inspection.
Diagram 1 — IDC Farm Planning & Connectivity
Block-flow diagram showing the full end-to-end connectivity path:
Internet → Router → Firewall → Border Leaf → Spectrum-4 (active-active) → Q3400-RA IB Fabric → 32× B300 Compute Servers
Also shows: OOB Management Switch, UFM Appliance, Management networking path, zone separation, fabric management connections.
Diagram 2 — Farm InfiniBand Topology (NVIDIA-Style Rail View)
Full IB topology in the NVIDIA reference style:
- Row 1: 4× Q3400-RA Spine switches (S0–S3)
- Row 2: 8× Q3400-RA Leaf switches (L0–L7), each rail color-coded
- Row 3: 32× server nodes (C01–C32), grouped in 4 sets of 8
- 256 rail-colored CX8→Leaf connections (8 colors, one per rail)
- 32 orange spine-to-leaf connections
- Ethernet section: Spectrum-4 #1/#2, UFM, OOB, Storage (future)
- 64 BF3→Spectrum-4 connections
Diagram 3 — Single Rack Leaf/Spine Connection Detail
Per-port detail for a single representative server rack (C01):
- All 8 CX8 ports to 8 leaf switches (color-coded by rail)
- All 8 leaf switches to 4 spine switches (orange, 8×4 = 32 links)
- BF3-0 → Spectrum-4 #1 (purple AOC)
- BF3-1 → Spectrum-4 #2 (purple AOC)
- X710 Port 0 + Port 1 → OOB Management Switch (yellow dashed)
- UFM Agent (SW) ← UFM Appliance (blue dashed)
- UFM Appliance → Leaf switches for IB fabric management (blue dashed)
- Connection legend (color key for all edge types)
Racks C02–C32 are identical — same 12-port topology, same cable types, same switch targets.
14. Key Design Decisions & Rationale
| Decision | Choice | Rationale |
|---|---|---|
| Server per rack | 1 server per rack (9U, ~14.5 kW sustained) | 20 kW/rack hard limit still governs the package |
| Compute population | 32 deployed nodes | Customer scope remains 32 nodes / 256 GPUs |
| Network baseline | 72-node-standard-aligned | Required for BOM, management, border, and OOB framing |
| IB switch model | Q3400-RA (8 leaf + 4 spine) | Live compute fabric remains rail-optimized and 1:1 non-blocking |
| Spectrum-4 role | Retained active-active pair | Explicit owner direction to keep the side fabric in place |
| UFM architecture | 2 nodes + per-server software agents | Standards alignment and HA posture outweigh the old single-node default |
| Management layer | SN2201-based | Replaces the old single generic OOB-switch story |
| Border / control layer | SN4700-based | Required to reflect the new standard-aligned network baseline |
| Storage-network layer | SN5610-based | Required to align the package with the accepted standard reference pattern |
| BOM presentation | Split baseline vs current deployment | Prevents accidental implication that 72 compute nodes are already live |
| Power narrative | 1.5 MW facility envelope + 680 kW rack ceiling | Separates facility allocation from rack-constrained usable envelope |
| Physical placement rule | Refresh via draw.io | Counts and roles are locked first; final placement follows in the updated diagrams |
15. Risk & Mitigation Notes
| Risk | Severity | Mitigation |
|---|---|---|
| GPU burst overshoot trips breaker | Medium | 4.6 kW burst margin remains at the 20 kW/rack ceiling; retain monitoring and BMC power controls |
| Single AOC cable failure (server→leaf) | Medium | Per-rail isolation limits blast radius; UFM rerouting and job checkpointing remain the primary control |
| Leaf switch failure | High | Rail isolation plus UFM rerouting still apply |
| Spectrum-4 failure | Low | Active-active pair retained; surviving switch preserves service at reduced aggregate Ethernet bandwidth |
| Single UFM-node failure | Low | Base design is now a dual-node UFM architecture |
Documentation ambiguity between 32 deployed and 72 standard baseline | High | Every externally facing table must label standard baseline and current deployment explicitly |
| Refreshed network-side power or cable roll-up published too early | Medium | Do not claim a new all-in network watt or all-cable total until the refreshed BOM and draw.io topology lock the missing entries |
Future scale-out language reuses the old ~64-node story | Medium | Reframe future growth against the 72-node-standard network baseline while keeping the live compute population at 32 |
Report generated: February 28, 2026
Architecture status: Locked
Diagrams: ✅ farm-idc-topology · ✅ farm-ib-topology · ✅ single-rack-topology
Next actions: Commit to repository · Stakeholder review · Procurement kick-off
Glossary
- NDR
- Next Data Rate — InfiniBand generation at 400 Gb/s (NDR400) or 800 Gb/s (NDR800) per physical port.
- NDR400
- InfiniBand NDR at 400 Gb/s per port, used by the BlueField-3 DPU for side-fabric connections.
- NDR800
- InfiniBand NDR at 800 Gb/s per port, used by ConnectX-8 HCAs on the HGX B300 GPU-to-fabric links.
- ConnectX-8
- NVIDIA ConnectX-8 NDR800 InfiniBand HCA integrated on the HGX B300 tray — 8 per server, one per GPU rail.
- BlueField-3
- NVIDIA BF-3220 DPU — 400G NDR400 InfiniBand, provides side-fabric connectivity and in-network compute offload.
- Q3400-RA
- NVIDIA Quantum-X800 Q3400 Rail-Accelerated InfiniBand switch — 144 NDR ports; deployed as 8 leaf + 4 spine.
- Spectrum-4
- NVIDIA Spectrum-4 400GbE/InfiniBand Ethernet switch — 51.2 Tb/s; retained as active-active side-fabric pair.
- SN5610
- NVIDIA Spectrum-SN5610 converged 400G Ethernet switch — 6 units in the storage/converged service plane.
- SN4700
- NVIDIA Spectrum-SN4700 400G Ethernet switch — 4 units for border/WAN handoff and control-plane.
- SN2201
- NVIDIA Spectrum-SN2201 1G/10G management switch — 17 units covering the full OOB management layer.
- UFM
- Unified Fabric Manager — NVIDIA IB fabric management; deployed as 2-node HA pair (production + standby).
- SHARP
- Scalable Hierarchical Aggregation and Reduction Protocol — in-network collective offload on Q3400-RA.
- HGX B300
- NVIDIA HGX Blackwell Ultra B300 — 8-GPU tray with NVLink Gen 5 at 1.8 TB/s per GPU, 14.4 TB/s aggregate across the tray.
- B300 GPU
- NVIDIA Blackwell Ultra B300 — 288 GB HBM3e, 1.1 kW TDP; current report basis uses ~4.5 PFLOPS FP8 dense / ~9 PFLOPS FP8 sparse and ~15 PFLOPS NVFP4 dense / ~30 PFLOPS NVFP4 sparse per GPU.
- NVLink
- NVIDIA direct GPU interconnect — Gen 5 on Blackwell at 1.8 TB/s per GPU, yielding 14.4 TB/s across an 8-GPU HGX B300 tray.
- HBM3e
- High Bandwidth Memory 3e — stacked DRAM in B300 GPUs at 288 GB per GPU, 8 TB/s peak bandwidth.
- Fat-Tree
- Network topology providing non-blocking bisection bandwidth; IB compute fabric is a 2-tier rail-optimised fat-tree.
- Rail-Optimised
- IB fabric layout: each GPU rail maps to a dedicated leaf switch, keeping AllReduce traffic rail-local.
- AOC
- Active Optical Cable — fibre-based cable with integrated E/O conversion, used for all NDR800 IB inter-rack links.
- IPMI / BMC
- Intelligent Platform Management Interface / Baseboard Management Controller — out-of-band server management.
- PDU-A / PDU-B
- Dual-feed power distribution: each PSU bank pairs with one PDU, giving N+5 PSU + dual-feed facility redundancy.
- CRAC / CRAH
- Computer Room Air Conditioner / Air Handler — precision cooling units, N+1 target coverage in the Kedios facility.
- DPU
- Data Processing Unit — BlueField-3 Smart NIC providing network/storage offload and security isolation.
- XA NB3I-E12
- ASUS server SKU: 9U air-cooled, dual Xeon 6776P, 32 × 128 GB DDR5 (4 TB total), 10× NVMe, HGX B300 ×8, CX-8 ×8, BF-3 ×2.
- Xeon 6776P
- Intel Xeon 6 Granite Rapids-SP — 56-core, PCIe 5.0 host CPU in the XA NB3I-E12 server; current server power tables in this repo model ~350 W per socket.
- NVFP4
- NVIDIA FP4 format — current report basis uses ~15 PFLOPS dense / ~30 PFLOPS sparse per B300 GPU, reported in this repo as ~240 PFLOPS sparse per 8-GPU server.
- FP8
- 8-bit float — current report basis uses ~4.5 PFLOPS dense / ~9 PFLOPS sparse per B300 GPU, with the report itself citing ~36 PFLOPS dense per 8-GPU server.
- AllReduce
- Distributed-training collective operation across all GPUs; accelerated by IB fat-tree fabric and SHARP.
- Fat-Tree Bisection BW
- 204.8 Tb/s across the full 32-server farm — 1:1 non-blocking, no fabric oversubscription.
- 20 kW Rack Limit
- Hard power cap per rack in the Kedios facility; servers draw ~14.5 kW sustained, leaving 5.5 kW margin.