Infrastructure & Supporting Systems
B300 Farm — Cooling · Power · Storage
Kedios B300 · 32-Node Blackwell Ultra GPU Farm · March 1, 2026 ·
Companion to: B300 Farm and Network Tower Architecture
650+ kW
Cooling Capacity Required
~590 TB
Shared Storage Usable
~436–466 kW
Power Headroom vs 1 MW
The 32-node B300 farm generates approximately 494 kW of sustained heat load across servers, networking, and storage within its allocated 1 MW facility budget.
Three supporting infrastructure pillars — cooling, power delivery, and storage — must be designed and racked in parallel with the compute and network zones already defined.
These systems live on a separate physical layer, each with dedicated rack positions, independent power feeds, and independent management.
❄️
Cooling
N+1
4× CRAC units · CAHC/HAHC containment · 650+ kWth
⚡
Power
Dual-Feed
80 rack PDUs · Row ATS · 5+5 PSU per server · Row UPS optional
🗄️
Storage
~590 TB
2× NVMe-dense nodes · IB-attached · NVMe-oF / Lustre
1.1 Cluster Heat Load
| Source | Count | Per-unit | Total Heat |
| B300 servers (sustained draw at wall) | 32 | ~14.5 kW | ~464 kW |
| Network racks (IB switches + Ethernet) | — | — | ~17.6 kW |
| Storage nodes (estimated) | ~4 | ~3 kW | ~12 kW |
| Total cluster heat load | ~494 kW |
| Burst ceiling (+6% GPU transient on compute) | ~522 kW |
⚠️
Cooling capacity must be rated for the burst figure (~522 kW) with N+1 headroom — i.e., minimum 650+ kWth installed.
N+1 means: if the largest CRAC unit fails, the remaining units must still cover full burst load alone.
1.2 Air Cooling Architecture (Primary)
The ASUS XA NB3I-E12 is air-cooled with 15× 80 mm high-RPM rear fans + 6× 60 mm CPU fans. Standard cold-aisle / hot-aisle containment is the primary method at Phase 1 scale.
Cold-Aisle / Hot-Aisle Containment
- Cold aisles face rack fronts — conditioned supply air enters inlets
- Hot aisles face rack rears — exhaust ~55–65°C at full GPU load
- Hot-aisle containment (HAC) ceiling panels / chimney preferred
- Prevents hot-air recirculation into adjacent cold aisles
Operating Temperature Thresholds
- Inlet air (normal operating): 18–27°C
- Inlet air (maximum ASHRAE A2): 35°C
- Relative humidity: 20–80% non-condensing
- Facility chilled water supply: ≤12°C recommended
1.3 CRAC / CRAH Unit Specification
| Parameter | Value |
| Total cooling required (burst ceiling) | ~522 kW |
| Recommended installed capacity | 650+ kWth (+20% headroom for N+1) |
| Minimum CRAC units | 3× (2 active + 1 standby = N+1) |
| Unit sizing (each) | 200–250 kWth chilled-water CRAH |
| Facility supply water temp | ≤12°C (≤15°C acceptable) |
| Airflow configuration | Under-floor (raised floor plenum) or overhead ducting |
| Temperature monitoring | Per-rack sensors → UFM / facility BMS integration |
ℹ️
The existing architecture document references CRAC #1 and CRAC #2. Under the revised load profile, two additional units (CRAC #3 and CRAC #4) are required to maintain N+1 coverage at full burst load. Confirm with the facility operator (datacenter).
1.4 Direct Liquid Cooling (Future Option — Phase 2)
At 8,800 W of pure GPU heat per server (8× B300 at 1,100 W TDP), DLC becomes attractive at scale-out beyond 32 nodes.
| Option | Description | Applicability |
| Rear-Door Heat Exchanger (RDHx) | Liquid-cooled door captures hot exhaust from existing fans | Drop-in retrofit, no server modification |
| Direct Liquid Cooling (DLC) | Coolant plates on GPU cold plates — eliminates hot-air exhaust | Requires server support — check ASUS roadmap |
| In-Row Cooling (IRC) | Dedicated cooling unit between rack rows | Good for zone isolation |
✅
At Phase 1 (32 nodes), standard CRAC + containment is sufficient. Evaluate DLC/RDHx at Phase 2 when cluster approaches 64 nodes and cooling load nears 750 kW.
2.1 Farm Power Budget
Compute servers (~464 kW)46.4% of 1 MW
Network racks (~17.6 kW)1.8% of 1 MW
Storage nodes (~12 kW)1.2% of 1 MW
Cooling infrastructure (~40 kW)4.0% of 1 MW
| Zone | Racks | Sustained Draw | Facility Allocation |
| Compute zone | 32 | ~464 kW | 640 kW (32 × 20 kW) |
| Network zone | 2 occupied | ~17.6 kW | ~40 kW |
| Storage zone | 1–2 | ~12 kW | ~20 kW |
| Cooling (CRAC fans, pumps) | — | ~40–70 kW | ~80 kW |
| Total Phase 1 | | ~534–564 kW | ~780 kW |
| Facility allocation (1 MW) | | | 1,000 kW |
| Headroom remaining | | | ~436–466 kW |
✅
The 1 MW allocation remains sufficient for Phase 1. Approximately 436–466 kW of headroom remains available for Phase 2 scale-out without requesting additional facility capacity from the datacenter.
2.2 Per-Rack PDU Specification
Every rack — compute, network, and storage — is fitted with two independent vertical PDUs (PDU-A fed from busbar A, PDU-B from busbar B). A single feed failure never takes down an entire rack.
| Parameter | Value |
| PDU form factor | 0U vertical, rear-post mounted |
| Feed configuration | Dual-feed: PDU-A (busbar A) + PDU-B (busbar B) |
| Phase | 3-phase, 230/400 V |
| Outlets | IEC C19 (server PSUs) + IEC C13 (management/1U devices) |
| Capacity per PDU | 32A 3-phase ≈ 22 kVA (~17.3 kW at 0.8 PF) |
| Metering | Per-outlet metered recommended (enables per-server power trending) |
| PDUs per rack | 2 (one per feed) |
| Zone | Racks | PDUs/rack | Total PDUs |
| Compute | 32 | 2 | 64 |
| Network (occupied) | 2 | 2 | 4 |
| Network (reserved N3–N6) | 4 | 2 | 8 (pre-wire) |
| Storage | 2 | 2 | 4 |
| Total | 40 | | 80 PDUs |
2.3 Server PSU Redundancy (5+5 Array)
Each ASUS XA NB3I-E12 carries a 5+5 PSU array — Bank A feeds GPU 0–3 and CPUs; Bank B feeds GPU 4–7 and CPUs:
| PSU Bank | Fed from | Serves | Redundancy |
| Bank A (5× PSUs) | PDU-A (busbar A) | GPU 0–3 + CPUs | N+5 within bank |
| Bank B (5× PSUs) | PDU-B (busbar B) | GPU 4–7 + CPUs | N+5 within bank |
ℹ️
On complete PDU-A or PDU-B loss, half the server PSUs go dark — but the surviving bank keeps its GPUs operational via NVLink ring topology, allowing a reduced-scale training run or graceful checkpoint until power is restored.
2.4 UPS and Power Protection
A 10–30 second gap between mains failure and diesel generator takeover is typical. For 256 NVLink-connected B300 GPUs mid-training, uncontrolled power loss = full training restart from last checkpoint. A layered protection approach is recommended:
| Level | Scope | Purpose | Capacity target |
| Facility UPS Provided | Entire building | Covers generator transfer gap | Facility-managed |
| Row ATS Required | Per 8–16 rack row | Instant A↔B feed switchover <4 ms — no power gap | Stateless transfer |
| Row UPS Recommended | Per 8–16 rack row | 60–120 s bridging for graceful checkpoint on generator start | ~150–200 kW × 2 min ≈ 5 kWh/row |
⚠️
Row UPS is optional if the facility guarantees generator transfer in <8 seconds. Confirm SLA with the datacenter operator before deciding.
If generator transfer is 15–30 s, row UPS is required to protect training checkpoints.
Graceful Shutdown Sequence (on power event)
- ATS detects mains failure → switches to B-feed (or facility UPS) in <4 ms
- BMC IPMI broadcasts power event to all 32 servers via OOB network
- DCIM / cluster manager triggers controlled checkpoint-to-NVMe on all nodes (~30 s)
- NVSwitch and NVLink drain cleanly before power removal
- Row UPS provides the 60–90 s window for this sequence
- Generator reaches stable voltage → ATS switches back to mains feed
3.1 Local NVMe (Per Server — Already Installed)
| Item | Value |
| NVMe drives per server | 10× Samsung PM9D3a U.2 (8 data + 2 OS) |
| Raw capacity per server | ~32 TB (assuming 3.2 TB per drive) |
| Usable per server (RAID 6 equiv.) | ~22–25 TB |
| Total raw across 32 servers | ~1 PB |
| Total usable across 32 servers | ~700–800 TB |
✅ Good for (local NVMe)
- OS and container images
- Checkpoint writes (fast local flush during training)
- Ephemeral scratch / temp data
- Single-node inference scratch pad
❌ Not sufficient for
- Shared dataset access across all 32 nodes simultaneously
- Pre-processed token corpus (LLM datasets routinely 10–50 TB+)
- Checkpoint aggregation from all 32 nodes to a single location
- Model weights staging at multi-node scale
3.2 Shared Storage Cluster (IB-Attached)
A shared parallel storage cluster connected to the existing IB fabric is required for multi-node distributed training workloads.
| Option | Technology | Throughput | Notes |
| NVMe-oF over IB Recommended | RDMA + NVMe-oF | 400+ GB/s aggregate | Lowest latency, native IB RDMA integration |
| Lustre over IB | Lustre + LNET | 200–400 GB/s | Standard HPC/AI, mature tooling |
| BeeGFS over IB | BeeGFS RDMA | 100–300 GB/s | Simpler management than Lustre |
| VAST Data | NFS/S3 over RDMA | 100–500 GB/s | All-flash appliance, scale-out |
3.3 Phase 1 Storage Configuration (2 Nodes)
| Component | Specification | Qty |
| Storage node chassis | 2U NVMe-dense (e.g. Supermicro SSG-610P-ACR12H or equiv.) | 2 |
| NVMe drives per node | 24× 15.36 TB U.2 enterprise read-intensive | 48 total |
| Raw capacity per node | ~369 TB | — |
| Total raw capacity | | ~738 TB |
| Usable (erasure coding 8+2) | | ~590 TB |
| IB connectivity per node | 2× NDR800 ports → existing leaf switches | 4 IB cables total |
| Lustre MDS/MGS node | 1U, 2× NDR IB ports (metadata server) | 1 |
| Storage protocol | NVMe-oF / Lustre OSS | — |
| Metric | Per Node | 2-Node Cluster |
| Sequential read throughput | ~100 GB/s | ~200 GB/s |
| Sequential write throughput | ~60 GB/s | ~120 GB/s |
| IB bandwidth consumed (peak read) | 800 Gb/s = 100 GB/s | ~200 GB/s → 2 leaf ports |
✅
200 GB/s aggregate read is sufficient for most LLM pre-training workloads up to 70B parameters at 32-node scale.
For 175B+ models, scale to 4 storage nodes in Phase 2.
ℹ️
Storage nodes connect via 2× NDR IB ports each into the existing leaf layer — no additional switches required.
Leaf switches have 44% port headroom at Phase 1, comfortably absorbing 4 storage IB ports.
| Zone | Rack IDs | Count | Status | Notes |
| Compute | C01 – C32 | 32 | Occupied | 32× ASUS XA NB3I-E12 B300 servers |
| Network | N1 – N2 | 2 | Occupied | IB switches + Ethernet + UFM + OOB |
| Network (reserved) | N3 – N6 | 4 | Reserved | Phase 2 expansion capacity |
| Storage | S01 | 1 | Occupied (Phase 1) | 2× storage nodes + MDS + management |
| Storage (expansion) | S02 | 1 | Reserved | Phase 2 storage scale-out |
| Total Phase 1 | | 35 | | Occupied positions |
| Total incl. reserves | | 40 | | All allocated positions |
Physical Zone Adjacency
Compute Zone · C01 – C32
32
Racks · ~464 kW sustained · Cold-aisle / hot-aisle containment · CRAC #1 #2 #3 #4
Network Zone · N1–N6
6
N1 N2 occupied · N3–N6 reserved
IB fabric · Ethernet · UFM · OOB
Storage Zone · S01–S02
2
S01 occupied · S02 reserved
NVMe-oF / Lustre · IB-attached
IB AOC inter-connect: compute ↔ network ↔ storage · ideally minimise distance for shortest cable runs
Cooling
| Item | Spec | Qty | Notes |
| CRAC / CRAH unit | 200–250 kWth chilled-water | 3 | 2 active + 1 standby (N+1) |
| Hot-aisle containment panels | Floor-to-ceiling, per row | Per layout | Facility-integrated |
| Inlet temperature sensors | Rack-mount, 1U per rack | 40 | One per rack, feeds BMS |
| BMS integration layer | DCIM or facility BMS tie-in | — | Aggregates temp / power / humidity |
Power
| Item | Spec | Qty | Notes |
| Rack PDU (metered, dual-feed) | 0U vertical, 3-phase 32A, C19+C13 | 80 | 2 per rack × 40 racks |
| Row ATS | 3-phase auto-transfer <4 ms | 4 | 1 per row of 8–16 racks |
| Row UPS module | 150–200 kW, 120 s runtime | 4 | Optional — confirm generator SLA first |
| Main distribution panel | A+B independent feeds, per facility | Facility | Independent circuits for each zone |
Storage
| Item | Spec | Qty | Notes |
| Storage node chassis | 2U NVMe-dense, 2× NDR IB | 2 | Phase 1 |
| NVMe drives | 15.36 TB U.2 enterprise read-intensive | 48 | 24 per storage node |
| Lustre MDS/MGS node | 1U, 2× NDR IB ports | 1 | Metadata server |
| IB NDR AOC cables | 5–10 m NDR800 | 4 | Storage nodes → existing leaf switches |
| Storage rack PDUs | Same spec as compute racks | 4 | 2 per rack × 2 racks (S01, S02) |
Storage — IB & OOB Connections
| Connection | From | To | Cable | BW |
| Storage IB data | S01 nodes (4× NDR ports) | Leaf switches L0–L3 | NDR800 AOC 5–10 m | 4× 800 Gb/s = 3.2 Tb/s |
| Storage OS management | Storage node management NICs | OOB 96-port switch (N2) | Cat6A | 3× 1 GbE |
| Storage BMC | Storage node BMC ports | OOB 96-port switch (N2) | Cat6A | 3× 1 GbE |
Updated OOB Switch Port Count
| Connection type | Count |
| OS management — 32 compute servers | 32 |
| BMC / iDRAC — 32 compute servers | 32 |
| Q3400-RA switch management (12 units) | 12 |
| Spectrum-4 switch management (2 units) | 2 |
| UFM management node | 1 |
| Storage node OS management (3 nodes) | 3 |
| Storage node BMC (3 nodes) | 3 |
| Uplink(s) | 1+ |
| Total used | 86 |
| 96-port OOB switch capacity | 96 |
| Remaining spare | 10 ports |
✅
Adding storage nodes consumes 6 more OOB switch ports (86 total vs 80 without storage). The 96-port OOB switch still has 10 spare ports — no switch upgrade required.
| Priority | Action |
| High | Confirm facility provides N+1 CRAC (650+ kWth) and validate inlet temperature SLA with the datacenter operator |
| High | Confirm MDP has independent A+B feeds for all three zones (compute, network, storage) |
| High | Specify and procure storage nodes + NVMe drives; order 4× NDR IB AOC cables for leaf integration |
| Medium | Confirm datacenter generator transfer time — determines whether row UPS modules are required |
| Medium | Install and configure Lustre/NVMe-oF stack during server commissioning |
| Low | Evaluate DLC / RDHx readiness for Phase 2 (64-node expansion, ~750 kW cooling load) |