Course-Disjoint Evaluation and Capacity-Aware Triage for Student Dropout Risk Prediction

Wijiyanto Wijiyanto, Aris Marjuni, Ahmad Zainul Fanani, Ruri Suko Basuki

Abstract


Early-warning systems for student dropout prevention require evaluation protocols and outputs that remain reliable when applied across heterogeneous academic contexts. This study quantifies how conventional random splits can overestimate performance when models are expected to generalize across different courses and proposes a decision-support layer that translates predicted risk into capacity-aware intervention policies. Using a benchmark higher-education dataset (N=4,424; 34 predictors; three classes: Dropout, Enrolled, Graduate) with 17 Course groups, phased prediction is implemented to reflect increasing evidence availability: S0 (pre-enrollment), S1 (plus semester-1 academic evidence), and S2 (plus semester-2 academic evidence). Baseline results are replicated with leakage-safe preprocessing (imputation, one-hot encoding, scaling) and Synthetic Minority Over-sampling Technique (SMOTE) applied strictly within training folds, comparing multinomial logistic regression, random forest, and tree-based boosting models. Deployment-oriented performance is assessed using StratifiedGroupKFold by Course to enforce course-disjoint testing. Discrimination is reported with Macro-F1 and Balanced Accuracy, while probability quality is evaluated using LogLoss, Brier score, expected calibration error, maximum calibration error, and reliability diagrams. Calibrated probabilities are translated into capacity-aware risk bands (Top-k% triage), selective prediction is evaluated via abstention to defer low-confidence cases, and split conformal prediction sets are optionally reported for multiclass uncertainty communication. Results show consistent performance drops under course-disjoint validation, confirming a non-trivial generalization gap. Error decomposition indicates that Enrolled is the most ambiguous class and exhibits phase-dependent confusion toward both terminal outcomes. Calibration shows phase-specific trade-offs between likelihood-based and worst-case calibration metrics, while risk bands yield high-precision triage under limited capacity, and abstention improves decision quality at reduced coverage. Overall, the study provides a deployment-oriented evaluation and decision-support workflow for translating dropout risk models into actionable capacity planning.


Article Metrics

Abstract: 9 Viewers PDF: 3 Viewers

Keywords


Student Dropout, Course-Disjoint Validation, Probability Calibration, Risk Bands, Selective Prediction, Conformal Prediction

Full Text:

PDF


Refbacks

  • There are currently no refbacks.



Barcode

Journal of Applied Data Sciences

ISSN : 2723-6471 (Online)
Publisher : Bright Publisher
Website : http://bright-journal.org/JADS
Email : taqwa@amikompurwokerto.ac.id (principal contact)
    support@bright-journal.org (technical issues)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0