Air Pollution Forecasting in Almaty using Ensemble Machine Learning Models
Abstract
This study develops an advanced forecasting methodology for air pollution levels in Almaty, Kazakhstan, focusing on fine Particulate Matter (PM2.5) and carbon monoxide concentrations. Air pollution poses significant risks to public health, and Almaty’s basin location exacerbates the problem. Addressing the limitations of traditional statistical forecasting methods, we propose an ensemble machine learning approach that integrates Seasonal-Trend decomposition with gradient boosting algorithms to capture complex temporal and nonlinear patterns. The objective is to develop and validate an effective methodology for forecasting atmospheric air pollution in Almaty using machine learning methods, in particular STL decomposition, XGBoost, LightGBM models, and their ensemble combination. The novelty lies in the integration of STL decomposition with an ensemble of gradient boosting models for high-accuracy air pollution forecasting in the complex urban environment of Almaty. The dataset includes hourly measurements from over 20 monitoring stations, enabling seasonal and spatial analysis. Rigorous preprocessing techniques were applied, including outlier removal, normalization, and time series decomposition into seasonal, trend, and residual components. Two gradient boosting models, XGBoost and LightGBM, were trained separately and combined into a weighted ensemble, with optimal weights determined through cross-validation. Figures and tables illustrate data preprocessing flow, model architectures, feature importance analysis, and evaluation of predictive performance. The ensemble outperformed individual models, achieving high accuracy with coefficient of determination values exceeding 0.98 for PM2.5 and 0.83 for carbon monoxide. The findings demonstrate that integrating Seasonal-Trend decomposition with ensemble learning provides a robust and effective approach to forecasting air pollution in complex urban environments. The methodology shows strong potential for practical application in real-time air quality monitoring and warning systems, aiding policymakers and public health authorities. Future research will expand the dataset by incorporating additional factors such as traffic flow, industrial emissions, and satellite remote sensing data to enhance predictive accuracy and model interpretability.
Article Metrics
Abstract: 3 Viewers PDF: 2 ViewersKeywords
Full Text:
PDFRefbacks
- There are currently no refbacks.
Journal of Applied Data Sciences
ISSN | : | 2723-6471 (Online) |
Collaborated with | : | Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia. |
Publisher | : | Bright Publisher |
Website | : | http://bright-journal.org/JADS |
: | taqwa@amikompurwokerto.ac.id (principal contact) | |
support@bright-journal.org (technical issues) |
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0