Comparison of Various Machine Learning Algorithms in Spatiotemporal Modeling of Fine Particulate Matter and Ozone Concentration in California Wildfires

DAMILOLA  AWOFISAYO; Pouyan  Ahmadi

doi:10.13021/jssr2020.3161

Authors

DAMILOLA AWOFISAYO
Pouyan Ahmadi

DOI:

https://doi.org/10.13021/jssr2020.3161

Abstract

Modern air quality monitors are quite sparse spatially and only record data only a few times a day. Much research has gone into the prediction of air quality concentration in normal situations. However, not much has gone into the study of such concentrations in wildfires, which pose a greater threat to human health by producing greater amounts of fine particulate matter(PM2.5) and ozone(O3). This study compares 7 different machine learning algorithms, 4 linear and 3 nonlinear, by creating spatiotemporal models of ozone and PM2.5 data. The linear models tested were linear regression, ridge regression, least absolute shrinkage and selection operator(LASSO), and elastic net regularization while the nonlinear algorithms used were random forest and two forms of extreme gradient boosting(XGBOOST). Using data from the EPA ozone and PM2.5 monitors in California during 2017 wildfires, the models aim to accurately predict air pollution concentrations in wildfire events. For the O3 models, the accuracy of the nonlinear algorithms far surpassed that of the linear algorithms. The best were the XGBOOST algorithm which had an RMSE of 0.000399 and the Random Forest that had an RMSE of 0.000286, while the best linear algorithm was the linear regression with 0.003948. In comparison, the PM2.5 models did slightly worse than the ozone models with the best algorithms, the XGBOOST and random forest algorithms, at 6.762904 and 7.896510 respectively.