| Component | Per-Unit | Qty | Subtotal |
|---|---|---|---|
| NVIDIA B300 GPU | 1,100 W | ×8 | 8,800 W |
| Intel Xeon 6776P CPU | 350 W | ×2 | 700 W |
| DDR5-6400 128 GB RDIMM | ~7 W | ×32 | 224 W |
| Samsung PM9D3a NVMe U.2 | ~10 W | ×10 | 100 W |
| ConnectX-8 NIC (HGX on-board) | ~20 W | ×8 | 160 W |
| BlueField-3 3220 DPU | ~40 W | ×2 | 80 W |
| Intel X710-AT2 Mgmt NIC | ~12 W | ×1 | 12 W |
| System fans (80 mm) | ~18 W | ×15 | 270 W |
| CPU fans (60 mm) | ~7 W | ×6 | 42 W |
| PCIe switch PEX89144 | ~12 W | ×1 | 12 W |
| VRMs / motherboard / misc | — | — | 175 W |
| Component sum | 10,575 W | ||
| PSU loss (80+ Titanium ~4.5%) | +494 W | ||
| At-wall sustained draw | ~14,500 W | ||
| In-rack PDU cabling (~1%) | Included in updated wall draw baseline | ||
| Total rack draw — sustained peak | ~14,500 W ≈ 14.5 kW | ||
| GPU burst overshoot (+6%) | ~15,370 W ≈ 15.4 kW |
| Zone | Capacity | Allocation | Spare |
|---|---|---|---|
| 32-rack compute zone | 32 racks | 32× ASUS XA NB3I-E12 B300 servers | 0 |
| 6-rack network zone | 6 racks | 2× dedicated switch/fabric racks | 4 (future scale-out) |
| Link Type | Cable | Rationale |
|---|---|---|
| ConnectX-8 → Q3400-RA (NDR 800 Gb/s) | AOC 5–10 m | Cross-zone ~5–15 m; DAC only viable ≤3 m |
| BlueField-3 → Spectrum-4 (400 GbE) | AOC 5–10 m | Same inter-zone distance |
| X710 mgmt NIC → OOB switch (10 GbE) | Cat6A 5–10 m | Copper viable to 100 m |
| Q3400-RA ↔ Q3400-RA (intra-network zone) | DAC ≤3 m | All switches co-located in N1 — short runs |
9U air-cooled rackmount · 32 units · 1 per rack · direct front-to-back airflow
| Component | Model / Spec | Qty per Server |
|---|---|---|
| GPU | NVIDIA Blackwell Ultra B300 (HGX tray) — TDP 1,100 W | 8 |
| GPU Memory | 288 GB HBM3e per GPU (12-high stacks) = 2.304 TB per server | — |
| CPU | Intel Xeon Platinum 6776P (56 cores, 350 W) | 2 |
| System RAM | Samsung M321RAJA0MB2-CCP 128 GB DDR5-6400 RDIMM | 32 (= 4 TB) |
| Boot SSD | Samsung PM9D3a U.2 Gen5 NVMe 1.92 TB | 2 |
| Data SSD | Samsung PM9D3a U.2 Gen5 NVMe 3.84 TB | 8 |
| IB NIC | ConnectX-8 (on-board, 1 per GPU) — 800 Gb/s NDR | 8 |
| DPU | NVIDIA BlueField-3 3220 — 400 Gb/s NDR400 | 2 |
| Mgmt NIC | Intel X710-AT2 dual-port 10 GbE RJ45 | 1 |
| UFM Agent | Software (no hardware) — installed on OS | 1 (SW) |
| Metric | Value |
|---|---|
| Total GPUs | 256× NVIDIA B300 |
| Total GPU memory | 73.73 TB HBM3e (256 × 288 GB) |
| Total system RAM | 128 TB DDR5 |
| Total NVMe storage | ~1,106 TB |
| Peak compute (FP8 dense) | ~1,152 PFLOPS |
| Peak compute (NVFP4 sparse) | ~7,680 PFLOPS |
| Operating power | ~464 kW |
| Peak sustained wall | ~464 kW |
| Absolute burst peak | ~492 kW |
| Component | Count | Role |
|---|---|---|
| Q3400-RA — Leaf | 8 | 32 server downlinks + 32 spine uplinks; 1 per GPU rail |
| Q3400-RA — Spine | 4 | 64 leaf-facing ports; full bisection mesh |
| Total Q3400-RA | 12 | 4U each · 115.2 Tb/s per unit |
| Spectrum-4 Ethernet | 2 | BF3 DPU 400 GbE — active-active redundancy |
| OOB 10 GbE mgmt switch | 1 | BMC/IPMI + OS management |
| UFM Appliance (hardware) | 1 | Centralized IB fabric manager |
| UFM Agent (software) | 32 | Installed on each server — no hardware |
| Total hardware units | 16 | 12 Q3400-RA + 2 Spectrum-4 + 1 OOB + 1 UFM |
| Fabric Layer | Calculation | Total Bandwidth |
|---|---|---|
| IB compute — server side | 256 ports × 800 Gb/s | 204.8 Tb/s |
| IB compute — spine bisection | 8 leaf × 32 uplinks × 800 Gb/s | 204.8 Tb/s — 1:1 non-blocking |
| BF3 storage / DPU fabric | 64 ports × 400 Gb/s | 25.6 Tb/s |
| OOB management | 80 connections × 10 GbE | 800 Gb/s |
| Decision | Rationale |
|---|---|
| 1 server per rack | 9U chassis; 20 kW/rack wall-output ceiling. Burst wall draw 15.4 kW → 4.6 kW safety margin. |
| 8 leaf + 4 spine Q3400-RA | Rail-optimized fat-tree: GPU port i → leaf i isolates 8 rails. 204.8 Tb/s bisection, 1:1 non-blocking. |
| Switches in separate racks | Keeps compute racks under 20 kW. Switch racks run ~8.8 kW each with 11.2 kW margin. |
| Network zone (6-rack) for switches | Adjacent to compute zone — short AOC runs. 4 spare positions for future scale-out. |
| Dual BF3 per server | Active-active storage fabric bonding. Offloads RDMA / encryption from host CPU. |
| 1 UFM Appliance central | Manages entire 32-node IB domain (256 endpoints + 12 switches). Capacity limit: 648 ports. |
| Air cooling (front-to-back) | ASUS XA NB3I-E12 is direct air-cooled. CRAC #1 & #2 support hot/cold aisle containment. |