42U Rack · ASUS XA NB3I-E12
U1–
U9
ASUS XA NB3I-E12
8× B300 GPU
2× Xeon 6776P
32× 128 GB DDR5
U11
1U Patch Panel (IB/Eth)
Chassis Height & Placement Rules
- Chassis height: 9U (U1–U9, front-of-rack, bottom)
- Bottom placement: optimized for GPU airflow — hot exhaust exits at top rear
- U10: reserved for future I/O shelf or temporary access
- U11: 1U patch panel — IB and Ethernet port aggregation
- U12–U13: horizontal cable management (Panduit or equivalent)
- U14–U37: empty (growth, cooling, airflow buffer)
- U38–U39: dual PDU rear-mount, PDU-A (circuit A) + PDU-B (circuit B)
- U40–U42: emergency spares — do not occupy
✅
Empty space (U14–U37) is intentional — provides airflow buffer and enables future component expansion without rack re-work. Do NOT fill this space without power re-audit.
Rack Form Factor
| Attribute | Value |
| Chassis model | ASUS XA NB3I-E12 |
| Form factor | 9U rackmount, 19" EIA-310-D |
| Depth | ~1,000 mm (≥1,050 mm with cable management) |
| Width | 440 mm (19" standard) |
| Cooling | Direct air-cooled, front-to-back, dual hot-plug fan modules |
| Operating temp | 10–35°C inlet (ASHRAE A2) |
| PSU config | 10× 3,200 W in 5+5 dual-bus layout → N+5 redundancy |
ℹ️
8 B300 GPUs are mounted as a single HGX tray — not individual cards. The tray communicates with the host CPU via PCIe 6.0. NVLink 5 runs within the tray at 14.4 TB/s collectively. ConnectX-8 NICs are soldered directly onto the HGX tray — on-board, not PCIe add-in cards.
GPU Core Specs
- Architecture: NVIDIA Blackwell Ultra
- Die: Reticle-limited + logic die (multi-die)
- FP8 tensor-core dense: ~4.5 PFLOPS/GPU
- NVFP4 sparse: ~30 PFLOPS/GPU
- TDP: 1,100 W
- Cooling: Direct air (HGX chassis fans)
HBM3e Memory — Per GPU
- Capacity: 288 GB HBM3e
- Stack config: 12-high per GPU
- Bandwidth: ~8 TB/s per GPU
- Total per tray: 2.304 TB (8 × 288 GB)
- ECC: On-die SECDED
NVLink 5 Intra-Tray Fabric
- Topology: All-to-all NVSwitch in tray
- Per-GPU BW: 1.8 TB/s bidirectional
- Tray aggregate: 14.4 TB/s
- Latency: <100 ns intra-GPU
- Unified memory: any-to-any GPU access
Host ↔ HGX Interface
- Protocol: PCIe Gen 6.0 x16 per GPU
- PCIe switch: PEX89144 aggregation
- CX8 NIC attachment: on-board (not host PCIe)
- BF3 DPU: 2× PCIe from host CPU complex
Intel Xeon 6776P ×2
- Codename: Granite Rapids SP
- Cores: 56P (no E-cores)
- Base / Boost: 2.2 / 3.8 GHz
- TDP: 350 W
- L3 cache: 300 MB
- PCIe: Gen 5.0 x80 per CPU
- Memory channels: 8-ch DDR5
128 GB DDR5 RDIMM ×32
- Model: Samsung M321RAJA0MB2-CCP
- Speed: DDR5-6400 MT/s
- Width: 72-bit (ECC)
- Slots: 16 per CPU × 2 CPUs = 32 total
- Total system RAM: 4 TB (32 × 128 GB)
- Aggregate BW: ~600 GB/s (dual socket)
| Role | Model | Interface | Capacity | Qty | Total |
| Boot | Samsung PM9D3a U.2 | PCIe Gen5 NVMe | 1.92 TB | ×2 | 3.84 TB |
| Data | Samsung PM9D3a U.2 | PCIe Gen5 NVMe | 3.84 TB | ×8 | 30.72 TB |
| Total NVMe per server | PM9D3a Gen5 family | 34.56 TB |
ℹ️
PM9D3a is Samsung's enterprise PCIe Gen5 NVMe. Sequential read ~14 GB/s / write ~8 GB/s per drive. 10 drives provide total sequential read headroom of ~140 GB/s per node — adequate for checkpoint and dataset streaming at model training speed.
PSU Configuration
- Unit: 3,200 W 80+ Titanium hot-plug PSU
- Count: 10 (two groups of 5)
- Layout: 5× PSU bus-A + 5× PSU bus-B
- Redundancy: N+5 per bus (1 PSU can fail per bus)
- Total capacity: 32,000 W
- Utilization at sustained peak: 35%
In-Rack PDU
- PDU-A: Circuit A → powers PSU bus-A (5 PSUs)
- PDU-B: Circuit B → powers PSU bus-B (5 PSUs)
- Both mounted rear-vertical at U38–U39
- Branch circuit breakers: sized for 16A or 20A outlets
- Feed redundancy: independent A/B UPS strings
Sustained Power Budget Breakdown
8× B300 GPU1,100 W × 8
8,800 W
2× Xeon 6776P CPU350 W × 2
700 W
32× DDR5 RDIMM~7 W × 32
224 W
Networking (CX8+BF3+X710)NICs + DPUs
252 W
Fans, NVMe, miscStorage + cooling + VRM
599 W
TOTAL AT WALL (sustained)incl. PSU 4.5% + PDU 1% loss
14.5 kW
HARD CEILING (20 kW)
15.0 kW
✅
~5.5 kW margin at sustained peak. ~4.6 kW margin at absolute burst (GPU overshoot +6%). PSU capacity (32 kW) far exceeds demand — thermal de-rating not a concern. 80+ Titanium efficiency at >50% load: ≥96%.
ConnectX-8 IB NICs — 8× per Server (on HGX tray)
ℹ️
ConnectX-8 NICs are soldered onto the HGX B300 tray — one per GPU. They are not PCIe add-in cards. Each operates at 800 Gb/s NDR InfiniBand. Each CX8 is assigned to a specific IB leaf switch rail, enabling 1:1 non-blocking parallel communication.
CX8 → Leaf Switch Rail Assignments
Rail 0 — Leaf L0
CX8[0] from GPU×0
Rail 1 — Leaf L1
CX8[1] from GPU×1
Rail 2 — Leaf L2
CX8[2] from GPU×2
Rail 3 — Leaf L3
CX8[3] from GPU×3
Rail 4 — Leaf L4
CX8[4] from GPU×4
Rail 5 — Leaf L5
CX8[5] from GPU×5
Rail 6 — Leaf L6
CX8[6] from GPU×6
Rail 7 — Leaf L7
CX8[7] from GPU×7
Each of the 32 servers has one CX8 on each rail. Result: 32 servers × 8 CX8s = 256 NDR-800 IB endpoints total.
BlueField-3 DPU — 2× per Server
| Attribute | BF3-0 (Primary) | BF3-1 (Secondary) |
| Model | BlueField-3 3220 | BlueField-3 3220 |
| Port speed | 400 Gb/s NDR400 | 400 Gb/s NDR400 |
| Target switch | Spectrum-4 switch #1 | Spectrum-4 switch #2 |
| Mode | Active-active ECMP bonding |
| Role | Storage access, RDMA, IP networking, encryption offload |
| Host interface | PCIe Gen5 x16 to CPU |
Management NIC
- Model: Intel X710-AT2 dual-port 10 GbE RJ45
- Port 0: OS management (connected to OOB management switch)
- Port 1: IPMI / BMC out-of-band (connected to OOB management switch)
- Cable: Cat6A shielded 10 GbE (copper — no AOC required)
- VLAN segregation: mgmt traffic isolated from data fabric
Every cable from a compute rack exits toward the network zone (Rack N1 or N2). All IB and Ethernet runs use AOC (active optical cable) due to inter-zone cross distance of 5–15 m.
| # | Source | Destination | Cable | Speed | Count |
| 1-8 | CX8[0] – CX8[7] | Leaf L0 – L7 (N1) | AOC NDR 800 | 800 Gb/s per | 8 |
| 9 | BF3-0 port 0 | Spectrum-4 #1 (N2) | AOC NDR400 | 400 Gb/s | 1 |
| 10 | BF3-1 port 0 | Spectrum-4 #2 (N2) | AOC NDR400 | 400 Gb/s | 1 |
| 11 | X710 port 0 (OS mgmt) | OOB switch (N2) | Cat6A | 10 GbE | 1 |
| 12 | X710 port 1 (BMC) | OOB switch (N2) | Cat6A | 10 GbE | 1 |
| 13 | PDU-A | Circuit A breaker panel | Power | 3-phase | 1 |
| 14 | PDU-B | Circuit B breaker panel | Power | 3-phase | 1 |
| Total cables exiting compute rack | 14 cables |
✅
14 cables total per rack is manageable with standard 1U patch panel (U11) and dual 1U cable management arms (U12–U13). Route IB AOC and Eth AOC together in overhead cable trays. Power cables route in floor trench or dedicated cable duct.
BMC / IPMI Access
- BMC: Integrated AST2600 (on motherboard)
- Interface: IPMI 2.0 + Redfish REST API
- Network: X710-AT2 port 1 → OOB switch
- VLAN: dedicated out-of-band management VLAN
- Allows: power cycle, KVM console, sensors, firmware
UFM Fabric Management
- Agent: NVIDIA UFM Agent (SW, installed on OS)
- Reports to: UFM Appliance (Rack N2)
- Provides: IB port telemetry, routing updates, SHARP coordination
- No additional hardware on compute rack