Fostering Geothermal Machine Learning Success: Elevating Big Data Accessibility and Automated Data Standardization in the Geothermal Data Repository: Preprint

Research output: Contribution to conferencePaper

Abstract

The Department of Energy's (DOE) Geothermal Data Repository (GDR) has implemented improvements to both its data lakes and its data standards and automated data pipelines. The GDR data lakes have reduced storage and compute-related barriers to using large geothermal datasets, enabling these large datasets to be accessed by anyone with a modern computer and internet access. More recently, the GDR has been working to further reduce barriers through streamlining the data intake process, educating users on the process and requirements, and aiding users in accessing data from the data lakes. These improvements have augmented the quantity of datasets the GDR is able to accept into its data lakes and have enabled users who are new to cloud tools to access these datasets more easily, overall increasing the accessibility of big geothermal data for use in machine learning and other projects. In addition, the GDR now has built-in data standards and pipelines for drilling data, geospatial data, and distributed acoustic sensing (DAS) data. These standardization efforts aim to enhance the real-world applicability of geothermal machine learning outcomes by improving the quality of training data. Specifically, through standardizing high-value datasets, the GDR is reducing project-specific data curation requirements, thus allowing more time for actual research. By automating this process, the burden of standardization is lifted from the user, ultimately increasing the availability of standardized data.
Original languageAmerican English
Number of pages16
StatePublished - 2024
EventGeothermal Rising Conference 2024 - Waikoloa, HI
Duration: 27 Oct 202430 Oct 2024

Conference

ConferenceGeothermal Rising Conference 2024
CityWaikoloa, HI
Period27/10/2430/10/24

NREL Publication Number

  • NREL/CP-6A20-90400

Keywords

  • accessibility
  • DAS
  • data
  • data lake
  • data pipeline
  • data science
  • data standard
  • distributed acoustic sensing
  • gdr
  • geospatial
  • GIS
  • user experience

Fingerprint

Dive into the research topics of 'Fostering Geothermal Machine Learning Success: Elevating Big Data Accessibility and Automated Data Standardization in the Geothermal Data Repository: Preprint'. Together they form a unique fingerprint.

Cite this