Abstract

Spatiotemporal data has evolved in scale due to augmented use in cross-domain applications. Simultaneously, there is substantial growth in the availability of Geographic Information Systems (GIS) data provided by the United States Geological Survey (USGS) along with other federal, state, county, or local agencies through open-data portals and public access APIs. However, data availability does not equate with accessibility. Large-scale analyses and applications require robust, performant data management with co-location of data storage and computing. The insufficiency of data management infrastructure compels researchers to adopt ad hoc project- specific GIS data storage solutions (e.g., copying data to High-Performance computer file systems). As an ad hoc storage strategy does not scale, it hampers cross-domain analyses causing difficulty in data reuse and utilizing existing code bases. Furthermore, GIS data is complex and requires expertise to analyze and manipulate due to its intricate data structures and data-specific projection transformations. Despite the challenges, we recognize that derived GIS data products, e.g., satellite or LIDAR-based images, can be used in downstream applications such as AI by domain, but non-GIS experts. To address the data needs and overcome the challenges, we are working towards a GIS Data Platform focused on efficient data storage, data discovery and access, and an API to enable common workflows. We propose a knowledge-graph (KG) approach for data discovery, whereby datasets are semantically linked to higher- level constructs such as projects and research areas. The semantic data links enable researchers to explore datasets in a top-down approach by specifying relevant and meaningful terms (assists in finding hidden data). An advantage is that the nodes and edges in a knowledge graph create built-in semantic documentation. Deeper spatiotemporal connections between data sources can be encoded via Graph Neural Networks (GNN) (Zhang et al., 2021). The KG approach can be extended to integrate the data itself in a Virtual KG (VKG). Our work will derive inspiration from large-scale VKG efforts that have been undertaken or are currently underway as part of the OpenStreetMap project (Ding et al., 2021). For DOE Data Days, we share the proposed geospatial data platform hybrid (cloud/on-prem) architecture, our work-to-date on storing, retrieving, and transforming LiDAR and raster data relevant to two important NREL use-cases, including the Renewable Energy Potential (reV) Model, and present our proposal for a KG based data discovery engine.
Original languageAmerican English
PublisherNational Renewable Energy Laboratory (NREL)
StatePublished - 2023

Publication series

NamePresented at DOE Data Days, 24-26 October 2023, Livermore, California

NREL Publication Number

  • NREL/PO-2C00-87748

Keywords

  • data platform
  • database
  • DEM
  • DSM
  • geospatial data
  • knowledge graph
  • LIDAR
  • Python
  • USGS Data

Fingerprint

Dive into the research topics of 'Geospatial Data Platform for All'. Together they form a unique fingerprint.

Cite this