A Stacking Ensemble Model for Predicting Student High School Graduation Outcomes

Fitriyani Fitriyani, Ari Amir Alkodri, Fajar Aswin

Abstract


This study develops and evaluates machine learning models to predict high school graduation outcomes and identify at-risk students for early intervention. Using a quantitative approach, data from 1,017 students across three public high schools were analyzed, encompassing academic performance (average yearly scores), behavioral factors (attendance rates and extracurricular participation), and socio-economic background (proxied by parental occupation). A comparative modeling strategy was applied, beginning with a Decision Tree baseline and advancing to a Stacking Ensemble model that integrated three heterogeneous base learners—Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree—combined through a Logistic Regression meta-model. Both models were optimized using GridSearchCV and adjusted for class imbalance between graduates (93.4%) and at-risk students (6.6%). The results showed that academic variables, particularly third-year average scores (mean = 82.6, SD = 6.4) and attendance rate (mean = 94.3%), were the strongest predictors of graduation, while socio-economic indicators had minimal impact. The Stacking Ensemble achieved a notable improvement over the Decision Tree, reaching an accuracy of 99.6%, precision of 0.909, recall of 1.000, F1-score of 0.952, and AUC of 1.000, compared to the baseline accuracy of 94.9% (F1-score = 0.519, AUC = 0.83). These findings indicate the superior predictive capability of the ensemble model in identifying students at risk of non-graduation. The study’s novelty lies in combining interpretable and high-performance models to construct a practical early-warning framework that can guide educators and policymakers in targeted academic interventions. However, the near-perfect metrics also suggest potential overfitting, emphasizing the need for validation using external datasets before broader application. Overall, this research contributes a robust, data-driven methodology for improving student retention through predictive analytics in educational settings.

Article Metrics

Abstract: 4 Viewers PDF: 1 Viewers

Keywords


Student Performance Prediction; Educational Data Mining; Early-Warning System; Stacking Ensemble; Decision Tree; Logistic Regression

Full Text:

PDF


Refbacks

  • There are currently no refbacks.



Barcode

Journal of Applied Data Sciences

ISSN : 2723-6471 (Online)
Collaborated with : Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia.
Publisher : Bright Publisher
Website : http://bright-journal.org/JADS
Email : taqwa@amikompurwokerto.ac.id (principal contact)
    support@bright-journal.org (technical issues)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0