Single Rack Server Unit Architecture

ASUS XA NB3I-E12
B300 GPU Server

9U air-cooled · 1 per rack · NVIDIA HGX Blackwell Ultra B300

8
NVIDIA B300 GPUs
2.304 TB
HBM3e per Unit
~36 PFLOPS
FP8 Dense
~14.5 kW
Sustained Wall Draw
9U
Chassis Height
42U
Rack Height
20 kW
Wall-Output Ceiling
1

Chassis Layout & Rack Unit Map

42U Rack · ASUS XA NB3I-E12
U1–
U9
ASUS XA NB3I-E12 8× B300 GPU 2× Xeon 6776P 32× 128 GB DDR5
U10
— reserved —
U11
1U Patch Panel (IB/Eth)
U12
Cable management 1U
U13
Cable management 1U
U14
— empty —
U15 – U37 empty
U38
PDU-A (rear)
U39
PDU-B (rear)
U40
— emergency spare —
U41
— emergency spare —
U42
— top spare —

Chassis Height & Placement Rules

  • Chassis height: 9U (U1–U9, front-of-rack, bottom)
  • Bottom placement: optimized for GPU airflow — hot exhaust exits at top rear
  • U10: reserved for future I/O shelf or temporary access
  • U11: 1U patch panel — IB and Ethernet port aggregation
  • U12–U13: horizontal cable management (Panduit or equivalent)
  • U14–U37: empty (growth, cooling, airflow buffer)
  • U38–U39: dual PDU rear-mount, PDU-A (circuit A) + PDU-B (circuit B)
  • U40–U42: emergency spares — do not occupy
Empty space (U14–U37) is intentional — provides airflow buffer and enables future component expansion without rack re-work. Do NOT fill this space without power re-audit.

Rack Form Factor

AttributeValue
Chassis modelASUS XA NB3I-E12
Form factor9U rackmount, 19" EIA-310-D
Depth~1,000 mm (≥1,050 mm with cable management)
Width440 mm (19" standard)
CoolingDirect air-cooled, front-to-back, dual hot-plug fan modules
Operating temp10–35°C inlet (ASHRAE A2)
PSU config10× 3,200 W in 5+5 dual-bus layout → N+5 redundancy
2

GPU Subsystem — NVIDIA HGX B300 Tray

ℹ️ 8 B300 GPUs are mounted as a single HGX tray — not individual cards. The tray communicates with the host CPU via PCIe 6.0. NVLink 5 runs within the tray at 14.4 TB/s collectively. ConnectX-8 NICs are soldered directly onto the HGX tray — on-board, not PCIe add-in cards.

GPU Core Specs

  • Architecture: NVIDIA Blackwell Ultra
  • Die: Reticle-limited + logic die (multi-die)
  • FP8 tensor-core dense: ~4.5 PFLOPS/GPU
  • NVFP4 sparse: ~30 PFLOPS/GPU
  • TDP: 1,100 W
  • Cooling: Direct air (HGX chassis fans)

HBM3e Memory — Per GPU

  • Capacity: 288 GB HBM3e
  • Stack config: 12-high per GPU
  • Bandwidth: ~8 TB/s per GPU
  • Total per tray: 2.304 TB (8 × 288 GB)
  • ECC: On-die SECDED

NVLink 5 Intra-Tray Fabric

  • Topology: All-to-all NVSwitch in tray
  • Per-GPU BW: 1.8 TB/s bidirectional
  • Tray aggregate: 14.4 TB/s
  • Latency: <100 ns intra-GPU
  • Unified memory: any-to-any GPU access

Host ↔ HGX Interface

  • Protocol: PCIe Gen 6.0 x16 per GPU
  • PCIe switch: PEX89144 aggregation
  • CX8 NIC attachment: on-board (not host PCIe)
  • BF3 DPU: 2× PCIe from host CPU complex
3

CPU & Host Memory Subsystem

Intel Xeon 6776P ×2

  • Codename: Granite Rapids SP
  • Cores: 56P (no E-cores)
  • Base / Boost: 2.2 / 3.8 GHz
  • TDP: 350 W
  • L3 cache: 300 MB
  • PCIe: Gen 5.0 x80 per CPU
  • Memory channels: 8-ch DDR5

128 GB DDR5 RDIMM ×32

  • Model: Samsung M321RAJA0MB2-CCP
  • Speed: DDR5-6400 MT/s
  • Width: 72-bit (ECC)
  • Slots: 16 per CPU × 2 CPUs = 32 total
  • Total system RAM: 4 TB (32 × 128 GB)
  • Aggregate BW: ~600 GB/s (dual socket)
4

Storage Subsystem

RoleModelInterfaceCapacityQtyTotal
BootSamsung PM9D3a U.2PCIe Gen5 NVMe1.92 TB×23.84 TB
DataSamsung PM9D3a U.2PCIe Gen5 NVMe3.84 TB×830.72 TB
Total NVMe per serverPM9D3a Gen5 family34.56 TB
ℹ️ PM9D3a is Samsung's enterprise PCIe Gen5 NVMe. Sequential read ~14 GB/s / write ~8 GB/s per drive. 10 drives provide total sequential read headroom of ~140 GB/s per node — adequate for checkpoint and dataset streaming at model training speed.
5

Power Distribution System

PSU Configuration

  • Unit: 3,200 W 80+ Titanium hot-plug PSU
  • Count: 10 (two groups of 5)
  • Layout: 5× PSU bus-A + 5× PSU bus-B
  • Redundancy: N+5 per bus (1 PSU can fail per bus)
  • Total capacity: 32,000 W
  • Utilization at sustained peak: 35%

In-Rack PDU

  • PDU-A: Circuit A → powers PSU bus-A (5 PSUs)
  • PDU-B: Circuit B → powers PSU bus-B (5 PSUs)
  • Both mounted rear-vertical at U38–U39
  • Branch circuit breakers: sized for 16A or 20A outlets
  • Feed redundancy: independent A/B UPS strings

Sustained Power Budget Breakdown

8× B300 GPU1,100 W × 8
8,800 W
8,800 W
2× Xeon 6776P CPU350 W × 2
700 W
700 W
32× DDR5 RDIMM~7 W × 32
224 W
224 W
Networking (CX8+BF3+X710)NICs + DPUs
252 W
252 W
Fans, NVMe, miscStorage + cooling + VRM
599 W
599 W
TOTAL AT WALL (sustained)incl. PSU 4.5% + PDU 1% loss
~14,500 W
14.5 kW
HARD CEILING (20 kW) 
15.0 kW
~5.5 kW margin at sustained peak. ~4.6 kW margin at absolute burst (GPU overshoot +6%). PSU capacity (32 kW) far exceeds demand — thermal de-rating not a concern. 80+ Titanium efficiency at >50% load: ≥96%.
6

Networking — InfiniBand & Ethernet

ConnectX-8 IB NICs — 8× per Server (on HGX tray)

ℹ️ ConnectX-8 NICs are soldered onto the HGX B300 tray — one per GPU. They are not PCIe add-in cards. Each operates at 800 Gb/s NDR InfiniBand. Each CX8 is assigned to a specific IB leaf switch rail, enabling 1:1 non-blocking parallel communication.

CX8 → Leaf Switch Rail Assignments

Rail 0 — Leaf L0
CX8[0] from GPU×0
Rail 1 — Leaf L1
CX8[1] from GPU×1
Rail 2 — Leaf L2
CX8[2] from GPU×2
Rail 3 — Leaf L3
CX8[3] from GPU×3
Rail 4 — Leaf L4
CX8[4] from GPU×4
Rail 5 — Leaf L5
CX8[5] from GPU×5
Rail 6 — Leaf L6
CX8[6] from GPU×6
Rail 7 — Leaf L7
CX8[7] from GPU×7
Each of the 32 servers has one CX8 on each rail. Result: 32 servers × 8 CX8s = 256 NDR-800 IB endpoints total.

BlueField-3 DPU — 2× per Server

AttributeBF3-0 (Primary)BF3-1 (Secondary)
ModelBlueField-3 3220BlueField-3 3220
Port speed400 Gb/s NDR400400 Gb/s NDR400
Target switchSpectrum-4 switch #1Spectrum-4 switch #2
ModeActive-active ECMP bonding
RoleStorage access, RDMA, IP networking, encryption offload
Host interfacePCIe Gen5 x16 to CPU

Management NIC

  • Model: Intel X710-AT2 dual-port 10 GbE RJ45
  • Port 0: OS management (connected to OOB management switch)
  • Port 1: IPMI / BMC out-of-band (connected to OOB management switch)
  • Cable: Cat6A shielded 10 GbE (copper — no AOC required)
  • VLAN segregation: mgmt traffic isolated from data fabric
7

Cable Exit Summary — Per Rack

Every cable from a compute rack exits toward the network zone (Rack N1 or N2). All IB and Ethernet runs use AOC (active optical cable) due to inter-zone cross distance of 5–15 m.

#SourceDestinationCableSpeedCount
1-8CX8[0] – CX8[7]Leaf L0 – L7 (N1)AOC NDR 800800 Gb/s per8
9BF3-0 port 0Spectrum-4 #1 (N2)AOC NDR400400 Gb/s1
10BF3-1 port 0Spectrum-4 #2 (N2)AOC NDR400400 Gb/s1
11X710 port 0 (OS mgmt)OOB switch (N2)Cat6A10 GbE1
12X710 port 1 (BMC)OOB switch (N2)Cat6A10 GbE1
13PDU-ACircuit A breaker panelPower3-phase1
14PDU-BCircuit B breaker panelPower3-phase1
Total cables exiting compute rack14 cables
14 cables total per rack is manageable with standard 1U patch panel (U11) and dual 1U cable management arms (U12–U13). Route IB AOC and Eth AOC together in overhead cable trays. Power cables route in floor trench or dedicated cable duct.
8

Out-of-Band Management

BMC / IPMI Access

  • BMC: Integrated AST2600 (on motherboard)
  • Interface: IPMI 2.0 + Redfish REST API
  • Network: X710-AT2 port 1 → OOB switch
  • VLAN: dedicated out-of-band management VLAN
  • Allows: power cycle, KVM console, sensors, firmware

UFM Fabric Management

  • Agent: NVIDIA UFM Agent (SW, installed on OS)
  • Reports to: UFM Appliance (Rack N2)
  • Provides: IB port telemetry, routing updates, SHARP coordination
  • No additional hardware on compute rack