Mastering HPC Runtime Prediction: From Observing Patterns to a Methodological Approach

Research output: Contribution to conferencePaper


The continual expansion of high-performance computing (HPC) brings with it an increasing need for efficiency. Heavy investment in energy, hardware, and software infrastructure to support peta- and exascale computing requires the optimization of existing systems and, wherever possible, the discernment and adoption of best-practices towards these goals. Such is the case for runtime prediction. When a job is submitted to an HPC system, an estimate of its runtime is provided by the user in the form of "requested wallclock". Error in this user-provided estimate can lead to jobs being prematurely killed by the scheduler, increased wait time on the queue, and decreased system utilization. More than fifteen years of research has been directed at mitigating these effects by using data-driven runtime predictions. Codified here is a set of commonalities and insights emerging from this body of work, which we present as recommendations and best practices. These practices are combined into a methodological approach described and evaluated on an 11-million-job dataset from the National Renewable Energy Laboratory's petascale HPC system, Eagle. This dataset and the accompanying codebase have been released to the public domain for the benefit of the wider HPC research community.
Original languageAmerican English
Number of pages11
StatePublished - 2023
EventPEARC '23: Practice and Experience in Advanced Research Computing - Portland, Oregon
Duration: 23 Jul 202327 Jul 2023


ConferencePEARC '23: Practice and Experience in Advanced Research Computing
CityPortland, Oregon

Bibliographical note

See NREL/CP-2C00-86526 for preprint

NREL Publication Number

  • NREL/CP-2C00-88325


  • high performance computing
  • runtime prediction
  • state of practice


Dive into the research topics of 'Mastering HPC Runtime Prediction: From Observing Patterns to a Methodological Approach'. Together they form a unique fingerprint.

Cite this