The implementation of a Gaussian Process Regressor in a geospatial LSTM model creates optimization in uncertainty quantification

Authors

  • Gisele Dao NSF Spatiotemporal Innovation Center, Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA
  • Kaylee Smith NSF Spatiotemporal Innovation Center, Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA
  • Chaowei Yang NSF Spatiotemporal Innovation Center, Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA

DOI:

https://doi.org/10.13021/jssr2025.5356

Abstract

For machine learning in the geospatial field, uncertainty quantification is pivotal towards evaluating model prediction accuracy. Methods of uncertainty quantification track how uncertainty evolves throughout the model training process and strive to distinguish epistemic uncertainty (model limitations). The Gaussian process is a unique method of uncertainty quantification because of its ability to propagate probability distributions over a function. However, the Gaussian process has yet to be widely studied within the geospatial field, with few applications especially in the Tensorflow program. In this study, the Gaussian process is implemented using Tensorflow in a Long Short-Term Memory(LSTM) model: a Gaussian Process Regressor trains with extracted features from the trained LSTM. Using geospatial air quality datasets, the Gaussian process model was evaluated with metrics like negative log-likelihood, expected calibration error, etc. and discovered to perform with greater value prediction accuracy than an LSTM only model, with about 20% lower mean absolute error. However, the Gaussian process has scalability limitations tied to its high computational cost, favoring smaller datasets. When training the model on a smaller dataset of Los Angeles County compared to a larger dataset of California (from OpenAQ and AirNow sensors), the smaller dataset yields better metrics, particularly a positive R squared value (as opposed to the larger dataset’s negative R squared) which indicates better quality data fitting. While the Gaussian process has the potential to optimize uncertainty quantification, its computational intensity poses problematic scalability constraints.

Published

2025-09-25

Issue

Section

College of Science: Department of Geography and Geoinformation Science