Symptom-Based Prediction of Respiratory Diseases Risk in Relation to Air Pollution Exposure
Abstract
With over half of the world's population exposed to rising air pollution levels, the risk of lung damage and
diseases like interstitial lung disease (ILD), pulmonary sarcoidosis, and chronic obstructive pulmonary
disease (COPD) increases. Early detection significantly enhances survival and recovery rates, yet research
using air quality data for lung disease prediction in the U.S. is scarce. This study aims to develop a
predictive model for lung disease incidence based on particulate matter (PM2.5) and ozone levels in the
United States, promoting more frequent screenings and early detection.
We used daily PM2.5 and ozone measurements from the Environmental Protection Agency (EPA)
alongside interstitial lung disease and COPD data for 2000, 2005, 2010, and 2014. Machine learning
techniques were employed for data preparation, analysis, model building, and training. Preprocessing
steps included merging and cleaning datasets to ensure data quality and robustness. The Spatial Lag
Model (SLM), Spatial Error Model (SEM), and Geographically Weighted Regression (GWR) were used to
analyze the spatial-temporal dependencies between PM2.5, ozone, and lung disease mortality.
County-specific latitude and longitude measures were used to build the GWR.
Preliminary results confirmed a significant correlation between air pollutant levels and lung disease
incidence. Our machine learning models effectively identified patterns and trends that can be leveraged
for early prediction of lung disease cases. The findings underscore the potential of utilizing air quality
data for predicting lung disease incidence, facilitating early diagnosis and treatment. This research
provides valuable insights into the health impacts of air pollution and highlights the necessity for further
exploration in this area.
Published
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.