A Comparative Study of Machine Learning Models for PM2.5 Prediction

Authors

  • Claire Guo Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA
  • Ahaan Shah Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA
  • Anusha Srirenganathanmalarvizhi Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA
  • Chaowei Yang Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA

Abstract

PM2.5 is a predominant pollutant with significant impacts on human health and atmospheric environmental quality. Fine particles such as PM2.5 can penetrate deep into the respiratory system, leading to various health issues, including respiratory and cardiovascular diseases. Moreover, high concentrations of PM2.5 can reduce visibility and contribute to environmental degradation. Despite numerous efforts, there remains a critical gap in systematic studies that effectively pinpoint the optimal models for predicting PM2.5 levels while accounting for meteorological influences. This study addresses this gap by aiming to identify the most effective models and the sweet spot in terms of predictive accuracy and the influence of meteorological factors on PM2.5 levels. To achieve this, datasets from 14 air quality monitoring sites across six northeastern U.S. states—New York, Pennsylvania, Vermont, Massachusetts, Connecticut, and Rhode Island—are utilized to enhance the retrieval and prediction of PM2.5 concentrations. The data encompasses atmospheric variablessuch as temperature, boundary layer height, and relative humidity, which are meticulously preprocessed to eliminate irrelevant information. Various machine learning models, including Linear Regression, Random Forest, Support Vector Machines, XGBoost, Extra Trees, and Deep Learning models, are employed to identify the optimal approach for PM2.5 prediction. The datasets are divided into training and testing sets, with the training data used to train the models and the test data reserved for evaluation. Model performance is rigorously assessed using Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R²) metrics. This research fills a vital gap in air quality research, providing environmental agencies with robust tools for accurate PM2.5 prediction and ultimately aiding in the development of more effective air pollution control strategies and public health policies. Additionally, the study identifies the most effective models for predicting PM2.5 levels and helps understand the influence of various meteorological factors on these predictions.

Published

2024-10-13

Issue

Section

College of Science: Department of Geography and Geoinformation Science