Is Knowledge About Running Applications Helping Improve Runtime Prediction of HPC Jobs?

Research output: NRELPoster

Abstract

High-performance computing systems rely upon scheduling algorithms to achieve high utilization. These schedulers rely upon user estimates of job resource requirements, such as runtime, to determine optimal scheduling of incoming jobs. These user estimates, however, are prone to error. To mitigate this error, significant research has been directed at providing better estimates of job runtime, usually employing machine learning techniques. These techniques are dependent upon the input features selected. Among the possible features is the primary application used by the job. In a survey of more than 20 papers directed at improving runtime prediction, only four included primary application as an input feature. We focus this investigation specifically on the value of adding primary application as an input feature, and find that it does improve model performance, especially for jobs with longer runtimes, though this improvement varies based on the application used. We recommend further research to determine the cause of this variability as well as an optimal strategy for employing a mixture of models both including and not including primary application as a feature.
Original languageAmerican English
StatePublished - 2023

Publication series

NamePresented at PEARC23, 23-27 July 2023, Portland, Oregon

NREL Publication Number

  • NREL/PO-2C00-86578

Keywords

  • feature selection
  • operations
  • runtime predictions
  • SHAP values

Fingerprint

Dive into the research topics of 'Is Knowledge About Running Applications Helping Improve Runtime Prediction of HPC Jobs?'. Together they form a unique fingerprint.

Cite this