Abstract
The Department of Energy's (DOE's) Geothermal Data Repository (GDR) has implemented improvements to both its data lakes and its data standards and automated data pipelines. The GDR data lakes have reduced storage and compute-related barriers to using large geothermal datasets, enabling these large datasets to be accessed by anyone with a modern computer and internet access. More recently, the GDR has been working to further reduce barriers through streamlining the data intake process, educating users on the process and requirements, and helping users access data from the data lakes. These improvements have augmented the quantity of datasets the GDR is able to accept into its data lakes and have enabled users who are new to cloud tools to access these datasets more easily, overall increasing the accessibility of big geothermal data for use in machine learning and other projects. In addition, the GDR now has built-in data standards and pipelines for drilling data, geospatial data, and distributed acoustic sensing (DAS) data. These standardization efforts aim to enhance the real-world applicability of geothermal machine learning outcomes by improving the quality of training data. Specifically, through standardizing high-value datasets, the GDR is reducing project-specific data curation requirements, thus allowing more time for actual research. By automating this process, the burden of standardization is lifted from the user, ultimately increasing the availability of standardized data. This paper provides an update on recent improvements made to the GDR's data lakes and automated data pipelines, including: (1) streamlining the data lake intake process, (2) better educating users on the process and requirements through a new data lakes page, (3) adding data lake direct access links to GDR data lake submission pages, (4) implementing a DAS data pipeline to convert DAS data uploaded in SEG-Y format to a standardized hierarchical data format v5 (HDF5), (5) extending this pipeline to encompass data in the GDR data lake, (6) adding metadata requirements for geospatial data, (7) making user interface/user experience (UX) enhancements to the data pipelines' documentation pages, and (8) improving the GDR's data standards and pipelines pages to better guide users in ensuring that their data is standardized by the GDR's automated data pipelines. 2024 Geothermal Resources Council. All rights reserved.
Original language | American English |
---|---|
Pages | 2279-2291 |
Number of pages | 13 |
State | Published - 2025 |
Event | 2024 Geothermal Rising Conference - Waikoloa, Hawaii Duration: 27 Oct 2024 → 30 Oct 2024 |
Conference
Conference | 2024 Geothermal Rising Conference |
---|---|
City | Waikoloa, Hawaii |
Period | 27/10/24 → 30/10/24 |
Bibliographical note
See NREL/CP-6A20-90400 for preprintNREL Publication Number
- NREL/CP-6A20-93583
Keywords
- accessibility
- das
- data
- data lake
- data pipeline
- data science
- data standard
- distributed acoustic sensing
- gdr
- geospatial
- gis
- user experience