HPC Digital Twins for Evaluating Scheduling Policies, Incentive Structures and Their Impact on Power and Cooling

  • Matthias Maiterth
  • , Wesley Brewer
  • , Jaya Kuruvella
  • , Arunavo Dey
  • , Tanzima Islam
  • , Rashadul Kabir
  • , Kevin Menear
  • , Dmitry Duplyakin
  • , Tapasya Patki
  • , Terry Jones
  • , Feiyi Wang

Research output: Contribution to conferencePaper

Abstract

Schedulers are critical for optimal resource utilization in high-performance computing. Traditional methods to evaluate schedulers are limited to post-deployment analysis, or simulators, which do not model associated infrastructure. In this work, we present the first-of-its-kind integration of scheduling and digital twins in HPC. This enables what-if studies to understand the impact of parameter configurations and scheduling decisions on the physical assets, even before deployment, or regarching changes not easily realizable in production. We (1) provide the first digital twin framework extended with scheduling capabilities, (2) integrate various top-tier HPC systems given their publicly available datasets, (3) implement extensions to integrate external scheduling simulators. Finally, we show how to (4) implement and evaluate incentive structures, as-well-as (5) evaluate machine learning based scheduling, in such novel digital-twin based meta-framework to prototype scheduling. Our work enables what-if scenarios of HPC systems to evaluate sustainability, and the impact on the simulated system.
Original languageAmerican English
Pages1959-1969
Number of pages11
DOIs
StatePublished - 2025
EventSC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis - St. Louis, Missouri
Duration: 16 Nov 202521 Nov 2025

Conference

ConferenceSC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
CitySt. Louis, Missouri
Period16/11/2521/11/25

NLR Publication Number

  • NLR/CP-2C00-98975

Keywords

  • batch scheduling
  • data center digital twin
  • digital twin
  • distributed systems simulation
  • scheduling simulators
  • system simulator

Fingerprint

Dive into the research topics of 'HPC Digital Twins for Evaluating Scheduling Policies, Incentive Structures and Their Impact on Power and Cooling'. Together they form a unique fingerprint.

Cite this