Abstract
High performance computing data centers will increasingly need to rely on automation to keep pace with exascale growth in compute capability and to manage and optimize the data center environment and facility resources. Artificial intelligence and machine learning approaches provide the means to improve HPC data center operational efficiency, by learning historical trends and training models to operate on real-time data collected from both IT and facilities sources. NREL has developed methods of real-time collection, aggregation and streaming of these data in the ESIF HPC Data Center and has collected a significant dataset of relevant metrics across computer systems, racks, environmental, building and utility sources for research into various predictive analytics problems. HPE's Advanced Technology Group (ATG) is doing comprehensive research into exascale monitoring and management for High Performance Computing (HPC) systems (hereinafter HPE's Data Monitoring/ Management Technology). NREL and HPE will collaborate to add Artificial Intelligence (AI) to NREL's real-time data collection/ aggregation/ streaming system and HPE's Data Monitoring/ Management System, with the goal of improving the operational efficiency of NREL's Energy Systems Integration Facility (ESIF) HPC Data Center through data analytics on both historical and real-time data from IT systems and facilities operations. This collaboration will consist of efforts in Data Management, Data Analytics, and AI/ML Optimization for both manual and autonomous intervention in data center operations. This will be a multi-year, multi-staged effort with a goal towards building capabilities for an Advanced Smart Facility, and demonstration of these techniques in the NREL ESIF HPC Data Center.
| Original language | American English |
|---|---|
| Number of pages | 19 |
| DOIs | |
| State | Published - 2025 |
NREL Publication Number
- NREL/TP-2C00-95486
Keywords
- CRADA
- data center
- smart facility