Improving the Quality of Geothermal Data Through Data Standards and Pipelines Within the Geothermal Data Repository: Preprint

Nicole Taverna, Jon Weers, Jay Huggins, Sean Porse, Arlene Anderson, Zach Frone, RJ Scavo

Research output: Contribution to conferencePaper


For machine learning outputs to be applicable to real world problems, high quality data are needed to ensure high quality results. With the more recent emphasis on machine learning in geothermal, there is an increasing need for greater focus on the quality of the data available for use in these projects. For example, Geothermal Operational Optimization Using Machine Learning (GOOML) utilized large quantities of geothermal power plant operational data to inform power plant operational configurations to maximize power generation. High quality datasets result from dependable sensors or devices collecting data, high frequency of measurements, sufficient data points, adequate metadata, reliable storage of data, and sufficient data curation. Another component that contributes to high quality data is reusability, which can be enhanced through data standardization. Data Standardization creates consistency in formatting and contents of like datasets, lessening preprocessing requirements and ensuring adequate information provided by a given dataset. The Geothermal Data Repository (GDR) aims to help improve data quality through automated data standardization for high-value datasets through the implementation of data pipelines alongside reliable and accessible long-term storage for datasets. As such, the GDR has decided to shift away from recommending the use of Excel-based content models and towards the implementation of automated data pipelines. This takes the burden of data standardization off the user and project team and will increase the availability of standardized geothermal data available through the GDR. A set of recommendations, or a data standard for each data type will exist with each data pipeline in order to advise data collection for maximum usability for future research. This paper serves to describe the GDR's proposed transition towards data standardization through automated data pipelines, to discuss the need for and value of such a shift, and to call for suggestions from the community regarding the most useful data standards and pipelines.
Original languageAmerican English
Number of pages10
StatePublished - 2023
Event48th Stanford Geothermal Workshop - Stanford, California
Duration: 6 Feb 20238 Feb 2023


Conference48th Stanford Geothermal Workshop
CityStanford, California

NREL Publication Number

  • NREL/CP-6A20-84994


  • data
  • data curation
  • data quality
  • data science
  • GDR
  • machine learning
  • pipelines
  • standardization


Dive into the research topics of 'Improving the Quality of Geothermal Data Through Data Standards and Pipelines Within the Geothermal Data Repository: Preprint'. Together they form a unique fingerprint.

Cite this