A Stacking Ensemble Model for Predicting Student High School Graduation Outcomes
Abstract
This study develops and evaluates machine learning models to predict high school graduation outcomes and identify at-risk students for early intervention. Using a quantitative approach, data from 1,017 students across three public high schools were analyzed, encompassing academic performance (average yearly scores), behavioral factors (attendance rates and extracurricular participation), and socio-economic background (proxied by parental occupation). A comparative modeling strategy was applied, beginning with a Decision Tree baseline and advancing to a Stacking Ensemble model that integrated three heterogeneous base learners—Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree—combined through a Logistic Regression meta-model. Both models were optimized using GridSearchCV and adjusted for class imbalance between graduates (93.4%) and at-risk students (6.6%). The results showed that academic variables, particularly third-year average scores (mean = 82.6, SD = 6.4) and attendance rate (mean = 94.3%), were the strongest predictors of graduation, while socio-economic indicators had minimal impact. The Stacking Ensemble achieved a notable improvement over the Decision Tree, reaching an accuracy of 99.6%, precision of 0.909, recall of 1.000, F1-score of 0.952, and AUC of 1.000, compared to the baseline accuracy of 94.9% (F1-score = 0.519, AUC = 0.83). These findings indicate the superior predictive capability of the ensemble model in identifying students at risk of non-graduation. The study’s novelty lies in combining interpretable and high-performance models to construct a practical early-warning framework that can guide educators and policymakers in targeted academic interventions. However, the near-perfect metrics also suggest potential overfitting, emphasizing the need for validation using external datasets before broader application. Overall, this research contributes a robust, data-driven methodology for improving student retention through predictive analytics in educational settings.
Article Metrics
Abstract: 4 Viewers PDF: 1 ViewersKeywords
Student Performance Prediction; Educational Data Mining; Early-Warning System; Stacking Ensemble; Decision Tree; Logistic Regression
Full Text:
PDF
DOI:
https://doi.org/10.47738/jads.v7i1.1067
Citation Analysis:
Refbacks
- There are currently no refbacks.
Journal of Applied Data Sciences
| ISSN | : | 2723-6471 (Online) |
| Collaborated with | : | Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia. |
| Publisher | : | Bright Publisher |
| Website | : | http://bright-journal.org/JADS |
| : | taqwa@amikompurwokerto.ac.id (principal contact) | |
| support@bright-journal.org (technical issues) |
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0




.png)