TY - GEN
T1 - Geospatial Data Platform for All
AU - Ramavajjala, Siddharth
AU - Zisman, Sagi
AU - Nag, Ambarish
AU - Williams, Travis
AU - Gu, Jianyu
AU - Duplyakin, Dmitry
PY - 2023
Y1 - 2023
N2 - Spatiotemporal data has evolved in scale due to augmented use in cross-domain applications. Simultaneously, there is substantial growth in the availability of Geographic Information Systems (GIS) data provided by the United States Geological Survey (USGS) along with other federal, state, county, or local agencies through open-data portals and public access APIs. However, data availability does not equate with accessibility. Large-scale analyses and applications require robust, performant data management with co-location of data storage and computing. The insufficiency of data management infrastructure compels researchers to adopt ad hoc project- specific GIS data storage solutions (e.g., copying data to High-Performance computer file systems). As an ad hoc storage strategy does not scale, it hampers cross-domain analyses causing difficulty in data reuse and utilizing existing code bases. Furthermore, GIS data is complex and requires expertise to analyze and manipulate due to its intricate data structures and data-specific projection transformations. Despite the challenges, we recognize that derived GIS data products, e.g., satellite or LIDAR-based images, can be used in downstream applications such as AI by domain, but non-GIS experts. To address the data needs and overcome the challenges, we are working towards a GIS Data Platform focused on efficient data storage, data discovery and access, and an API to enable common workflows. We propose a knowledge-graph (KG) approach for data discovery, whereby datasets are semantically linked to higher- level constructs such as projects and research areas. The semantic data links enable researchers to explore datasets in a top-down approach by specifying relevant and meaningful terms (assists in finding hidden data). An advantage is that the nodes and edges in a knowledge graph create built-in semantic documentation. Deeper spatiotemporal connections between data sources can be encoded via Graph Neural Networks (GNN) (Zhang et al., 2021). The KG approach can be extended to integrate the data itself in a Virtual KG (VKG). Our work will derive inspiration from large-scale VKG efforts that have been undertaken or are currently underway as part of the OpenStreetMap project (Ding et al., 2021). For DOE Data Days, we share the proposed geospatial data platform hybrid (cloud/on-prem) architecture, our work-to-date on storing, retrieving, and transforming LiDAR and raster data relevant to two important NREL use-cases, including the Renewable Energy Potential (reV) Model, and present our proposal for a KG based data discovery engine.
AB - Spatiotemporal data has evolved in scale due to augmented use in cross-domain applications. Simultaneously, there is substantial growth in the availability of Geographic Information Systems (GIS) data provided by the United States Geological Survey (USGS) along with other federal, state, county, or local agencies through open-data portals and public access APIs. However, data availability does not equate with accessibility. Large-scale analyses and applications require robust, performant data management with co-location of data storage and computing. The insufficiency of data management infrastructure compels researchers to adopt ad hoc project- specific GIS data storage solutions (e.g., copying data to High-Performance computer file systems). As an ad hoc storage strategy does not scale, it hampers cross-domain analyses causing difficulty in data reuse and utilizing existing code bases. Furthermore, GIS data is complex and requires expertise to analyze and manipulate due to its intricate data structures and data-specific projection transformations. Despite the challenges, we recognize that derived GIS data products, e.g., satellite or LIDAR-based images, can be used in downstream applications such as AI by domain, but non-GIS experts. To address the data needs and overcome the challenges, we are working towards a GIS Data Platform focused on efficient data storage, data discovery and access, and an API to enable common workflows. We propose a knowledge-graph (KG) approach for data discovery, whereby datasets are semantically linked to higher- level constructs such as projects and research areas. The semantic data links enable researchers to explore datasets in a top-down approach by specifying relevant and meaningful terms (assists in finding hidden data). An advantage is that the nodes and edges in a knowledge graph create built-in semantic documentation. Deeper spatiotemporal connections between data sources can be encoded via Graph Neural Networks (GNN) (Zhang et al., 2021). The KG approach can be extended to integrate the data itself in a Virtual KG (VKG). Our work will derive inspiration from large-scale VKG efforts that have been undertaken or are currently underway as part of the OpenStreetMap project (Ding et al., 2021). For DOE Data Days, we share the proposed geospatial data platform hybrid (cloud/on-prem) architecture, our work-to-date on storing, retrieving, and transforming LiDAR and raster data relevant to two important NREL use-cases, including the Renewable Energy Potential (reV) Model, and present our proposal for a KG based data discovery engine.
KW - data platform
KW - database
KW - DEM
KW - DSM
KW - geospatial data
KW - knowledge graph
KW - LIDAR
KW - Python
KW - USGS Data
M3 - Poster
T3 - Presented at DOE Data Days, 24-26 October 2023, Livermore, California
PB - National Renewable Energy Laboratory (NREL)
ER -