● Compute Zone — 32 Racks
C01
C02
C03
C04
C05
C06
C07
C08
C09
C10
C11
C12
C13
C14
C15
C16
C17
C18
C19
C20
C21
C22
C23
C24
C25
C26
C27
C28
C29
C30
C31
C32
Each rack: 1× ASUS XA NB3I-E12 B300 · 8× B300 GPU · ~14.5 kW sustained (~15.4 kW burst) · AOC exits to network zone
⟷
AOC 5–10 m
◆ Network Zone — 6 Positions
N1 IB Fabric12× Q3400-RA
N2 Eth/Mgmt2× Sp4 + UFM + OOB
N3–N6Spare (future)
4 spare positions for network scale-out
✅
32-rack compute zone is 100% occupied. 6-rack network zone is adjacent — 2 racks used for all switching hardware, 4 reserved for future expansion. AOC cables cross the zone boundary.
Physical Rack Inventory
| Rack ID | Zone | Contents | Power Draw |
| C01–C32 | Compute | 1× ASUS XA NB3I-E12, 8× B300 GPU, 10× PSU, 10× NVMe, dual PDU | ~14.5 kW sustained / 15.4 kW burst |
| N1 | Network | 8× Q3400-RA (leaf) + 4× Q3400-RA (spine) + 2× PDU | ~8.8 kW |
| N2 | Network | 2× Spectrum-4 + 1× UFM Appliance 1U + 1× OOB 10 GbE switch + PDU | ~6.5 kW |
| N3–N6 | Network | Empty — reserved for scale-out | 0 |
| Total | 34 racks (32 compute + 2 network active + 4 spare) | ~492 kW burst farm wide (compute only) |
Hardware per Rack
- 1× ASUS XA NB3I-E12 (9U at U1–U9)
- 8× NVIDIA B300 GPU (on HGX tray)
- 2× Intel Xeon 6776P CPU
- 32× Samsung 128 GB DDR5-6400 RDIMM
- 10× Samsung PM9D3a NVMe (U.2 Gen5)
- 8× CX8 on-board IB NICs (800 Gb/s NDR)
- 2× BlueField-3 3220 DPU (400 Gb/s)
- 1× Intel X710-AT2 dual 10 GbE
- 10× 3,200 W 80+ Titanium PSUs
- 2× PDU (A+B, rear-mount)
Per-Rack Outputs
- Compute: ~36 PFLOPS FP8 / ~240 PFLOPS NVFP4
- GPU memory: 2.304 TB HBM3e
- NVLink BW: 14.4 TB/s (within tray)
- IB uplinks: 8× 800 Gb/s NDR (one per leaf)
- Eth uplinks: 2× 400 Gb/s to Spectrum-4
- Management: 2× 10 GbE to OOB switch
- Sustained wall draw: ~14.5 kW
- Margin to 20 kW wall-output ceiling: +5.5 kW
ℹ️
All 32 compute racks are identical builds. This enables fast replacement, symmetric topology, and uniform UFM policy application. Each rack is an isolated failure domain — single-rack power or cooling failure does not affect neighbor racks.
✅
All 12 Q3400-RA switches (8 leaf + 4 spine) fit in a single 42U rack. Combined rack height: 48U of switch + 4U patch + 2U cable mgmt = 54U of 84U over 2 standard 42U racks — but N1 alone (one 42U) can hold all 12 with margins due to 4U-per-switch form factor.
Q3400-RA Switch Specification
- Form factor: 4U rackmount
- Ports: 144× 800 Gb/s NDR InfiniBand
- Switch BW: 115.2 Tb/s full duplex
- Latency: <130 ns
- SHARP in-network compute: Yes
- Power: ~500–700 W per unit
- Management: UFM Appliance protocol
- Cable type (intra-N1): DAC ≤3 m
- Cable type (to compute): AOC 5–10 m
- Fan: Front-to-rear hot-plug
- PSU: Dual hot-plug
- Total in farm: 12 units
Fat-Tree Two-Tier Topology
32 Server Compute Racks (C01–C32)
— 8 CX8 per server, one per leaf
C01
C02
C03
C04
C05–C28
C29
C30
C31
C32
256× AOC NDR800 (32 servers × 8 CX8)
8 Leaf Switches (L0–L7) — Rail-Optimized: CX8[i] → Leaf[i]
L0
Rail 0
L1
Rail 1
L2
Rail 2
L3
Rail 3
L4
Rail 4
L5
Rail 5
L6
Rail 6
L7
Rail 7
256 DAC cables intra-N1 (8 parallel × 4 spines × 8 leaves)
4 Spine Switches (S0–S3) — Full any-to-any inter-leaf path
Spine S0
Spine S1
Spine S2
Spine S3
Leaf/Spine Port Allocation
| Switch | Role | Downlinks (to servers) | Uplinks (to spines) | Used / Total | Utilization |
| L0–L7 | Leaf (×8) | 32 ports — 1 per server (CX8[rail-i]) | 32 ports → 8 per spine | 64 / 144 | 44% |
| S0–S3 | Spine (×4) | — (spine-only) | 64 ports from 8 leaves | 64 / 144 | 44% |
Rack N1 Physical Layout (42U)
| Position | Component | Unit Size |
| U1–U4 | Q3400-RA Leaf L0 | 4U |
| U5–U8 | Q3400-RA Leaf L1 | 4U |
| U9–U12 | Q3400-RA Leaf L2 | 4U |
| U13–U16 | Q3400-RA Leaf L3 | 4U |
| U17–U20 | Q3400-RA Leaf L4 | 4U |
| U21–U24 | Q3400-RA Leaf L5 | 4U |
| U25–U28 | Q3400-RA Leaf L6 | 4U |
| U29–U32 | Q3400-RA Leaf L7 | 4U |
| U33 | Patch panel / cable tray | 1U |
| U34–U37 | Q3400-RA Spine S0 | 4U |
| U38–U41 | Q3400-RA Spine S1 | 4U |
| (Rack N1-B U1–U4) | Q3400-RA Spine S2 | 4U |
| (Rack N1-B U5–U8) | Q3400-RA Spine S3 | 4U |
| Total switch height: 12 × 4U = 48U across N1 / N1-B | 48U |
Spectrum-4 Ethernet Switch ×2 (Active-Active)
| Attribute | Value |
| Model | NVIDIA Spectrum-4 |
| Form factor | 2U rackmount |
| Ports | 128× 400 GbE (or 64× 800 GbE) |
| Switch BW | 51.2 Tb/s full duplex |
| Deployment | Active-active (both switches active simultaneously) |
| Uplinks from BF3 | 64 BF3 DPU ports × 400 Gb/s = 25.6 Tb/s aggregate |
| Role | Ethernet storage fabric, RDMA over Ethernet, east-west traffic |
| Port utilization | 32 BF3 uplinks per switch / 128 ports = 25% |
UFM Appliance (1U)
- Hardware: 1U dedicated UFM Appliance
- Capacity: 648 fabric ports — well above 256 + 12 × 144 = 1,984 port count (note: port-count budgeting applies per managed domain)
- Functions: SM (Subnet Manager), routing engine, SHARP orchestration, telemetry
- UFM Agents: 32× SW agents installed on each server OS report to this appliance
- Connection: 10 GbE management port → OOB switch
- Redundancy: single appliance (all IB fabric config state is in hardware switches)
OOB Management Switch (1U)
- Protocol: 10 GbE
- Port count: ≥96 ports (32 OS mgmt + 32 BMC/IPMI + 12 Q3400-RA mgmt + 2 Spectrum-4 mgmt + 1 UFM + uplinks = 80 connections required)
- Connected devices: all 32 server X710-AT2 (OS mgmt + BMC), UFM Appliance
- VLAN config: dedicated out-of-band management VLAN isolated from data plane
Rack N2 Physical Layout (42U)
| Position | Component | Size |
| U1–U2 | Spectrum-4 #1 (Active) | 2U |
| U3–U4 | Spectrum-4 #2 (Active) | 2U |
| U5 | UFM Appliance | 1U |
| U6 | OOB 10 GbE Management Switch | 1U |
| U7–U8 | Patch panel (Eth to compute) | 2U |
| U9–U42 | Empty / cable management / future | 34U |
| Active equipment height | 8U of 42U |
All cables cross the zone boundary between compute racks (C01–C32) and network racks (N1, N2). Distance ~5–15 m. This mandates AOC (active optical) for IB and Ethernet; copper Cat6A for 10 GbE management.
| Cable Category | Count | Type | Speed | Route |
| CX8 → Q3400-RA Leaf (IB) | 256 | AOC NDR800 | 800 Gb/s each | 32 servers × 8 CX8 → 8 leaf switches |
| BF3 DPU → Spectrum-4 (Eth) | 64 | AOC NDR400 | 400 Gb/s each | 32 servers × 2 BF3 → 2 Spectrum-4 |
| X710 OS Mgmt → OOB switch | 32 | Cat6A | 10 GbE | 32 servers × port 0 → OOB |
| X710 BMC/IPMI → OOB switch | 32 | Cat6A | 10 GbE | 32 servers × port 1 → OOB |
| Q3400-RA leaf ↔ spine (intra-N1) | 256 | DAC ≤3 m | 800 Gb/s each | 8 parallel links per leaf-spine pair × 4 spines × 8 leaves = 256 cables |
| Cross-zone total (AOC+Cat6A) | 384 | 256 IB AOC + 64 Eth AOC + 64 Cat6A mgmt |
| N1 intra-rack (DAC) | 256 | Leaf-to-spine within Rack N1 only |
| Resource | Current Use | Capacity | Headroom |
| 1 MW power budget | ~530 kW (compute burst + network + cooling overhead) | 1,000 kW | 470 kW (47%) |
| Network zone rack slots | 2 of 6 used | 6 positions | 4 spare racks |
| Leaf switch ports | 32 downlinks used per leaf | 144 ports per Q3400-RA | 80 ports spare per leaf |
| Spine switch ports | 64 uplinks used per spine | 144 ports per Q3400-RA | 80 ports spare per spine |
| Spectrum-4 ports | 32 per switch (BF3 uplinks) | 128 ports per switch | 96 ports spare per switch |
| OOB switch ports | 80 required (32 OS + 32 BMC + 12 switch mgmt + 2 Sp4 + 1 UFM + uplinks) | 96-port switch | 16 ports spare |
| UFM Appliance capacity | 256 server + 12 switch ports | 648 ports (domain capacity) | Available for scale |
✅
The current build can be expanded significantly without topology changes. Leaf switches each have 80 unused downlink ports — enough to add more servers without new switches. The 4 spare network zone racks allow new IB tiers if needed. ~470 kW of power headroom still funds substantial additional compute.