Abstract
The Department of Energy's (DOE) Geothermal Data Repository (GDR) team has implemented data standards and automated data pipelines for the following data types: 1) drilling data, 2) geospatial datasets, and 3) DAS data. An additional data pipeline is proposed for stimulation data. These data standards and pipelines are intended to improve the real-world applicability of geothermal machine learning outputs through improving the quality of data. More specifically, through standardizing high-value datasets, the GDR is reducing project-specific data curation requirements, allowing more time to be spent on actual research. By automating this process, the burden of standardization is taken off of the user, overall increasing the availability of standardized data. This paper provides an update on the GDR's transition toward data standardization through automated data pipelines and calls for feedback from the community on how we can improve this process.
Original language | American English |
---|---|
Number of pages | 18 |
State | Published - 2023 |
Event | Geothermal Rising Conference 2023 - Reno, NV Duration: 30 Sep 2023 → 4 Oct 2023 |
Conference
Conference | Geothermal Rising Conference 2023 |
---|---|
City | Reno, NV |
Period | 30/09/23 → 4/10/23 |
Bibliographical note
See NREL/CP-6A20-88752 for paper as published in proceedingsNREL Publication Number
- NREL/CP-6A20-86935
Keywords
- cloud-optimized
- DAS data
- data curation
- data lake
- data pipeline
- data science
- data standard
- geospatial data
- geothermal data
- Geothermal Data Repository
- machine learning