This repository contains a physics-first, vendor-agnostic reference architecture for GPU-accelerated AI infrastructure.
It defines infrastructure as an engineering contract derived from workload physics:
Workload → Power → Heat → Fluid → Rejection → Operations
The architecture is expressed through:
- Engineering Basis of Design (BoD) — requirements and envelopes
- Appendices — derivations and commissioning evidence
- Reference Architecture — topology instantiation
- Witness closure — alignment with a representative AI workload
-
Workload → Infrastructure projection
appendices/20_Appendix_A_Workload_to_Boundary.md -
Thermal closure and digital twin basis
appendices/21_Appendix_B_Thermal_and_Twin.md -
Commissioning and FMEA mapping
appendices/24_Appendix_E_FMEA_and_Commissioning.md -
Witness workload closure (CausalCompute)
appendices/25_Appendix_F_CausalCompute_Alignment.md
-
Engineering Basis of Design
→ 10_BOD_Engineering_Basis_of_Design.md -
Reference Architecture
→ 30_Reference_Architecture_10MW_Pod.md -
Appendices
→ appendices/ -
Full document (PDF)
→ AI_Native_Pod_Reference_Architecture_PoC.pdf
This document defines:
- Power envelopes and transient limits
- Thermal closure and flow requirements
- Hydraulic sizing and fault domains
- Sustainability and KPI measurement boundaries
- Commissioning and verification methodology
This document does not define vendor selection or construction drawings.
This work demonstrates how GPU infrastructure can be derived from workload physics rather than vendor templates, enabling predictable scaling and operational safety.
Antreas Christofi
HPC Infrastructure Engineer
antreas@christofi.me