Course-Disjoint Evaluation and Capacity-Aware Triage for Student Dropout Risk Prediction
Abstract
Early-warning systems for student dropout prevention require evaluation protocols and outputs that remain reliable when applied across heterogeneous academic contexts. This study quantifies how conventional random splits can overestimate performance when models are expected to generalize across different courses and proposes a decision-support layer that translates predicted risk into capacity-aware intervention policies. Using a benchmark higher-education dataset (N=4,424; 34 predictors; three classes: Dropout, Enrolled, Graduate) with 17 Course groups, phased prediction is implemented to reflect increasing evidence availability: S0 (pre-enrollment), S1 (plus semester-1 academic evidence), and S2 (plus semester-2 academic evidence). Baseline results are replicated with leakage-safe preprocessing (imputation, one-hot encoding, scaling) and Synthetic Minority Over-sampling Technique (SMOTE) applied strictly within training folds, comparing multinomial logistic regression, random forest, and tree-based boosting models. Deployment-oriented performance is assessed using StratifiedGroupKFold by Course to enforce course-disjoint testing. Discrimination is reported with Macro-F1 and Balanced Accuracy, while probability quality is evaluated using LogLoss, Brier score, expected calibration error, maximum calibration error, and reliability diagrams. Calibrated probabilities are translated into capacity-aware risk bands (Top-k% triage), selective prediction is evaluated via abstention to defer low-confidence cases, and split conformal prediction sets are optionally reported for multiclass uncertainty communication. Results show consistent performance drops under course-disjoint validation, confirming a non-trivial generalization gap. Error decomposition indicates that Enrolled is the most ambiguous class and exhibits phase-dependent confusion toward both terminal outcomes. Calibration shows phase-specific trade-offs between likelihood-based and worst-case calibration metrics, while risk bands yield high-precision triage under limited capacity, and abstention improves decision quality at reduced coverage. Overall, the study provides a deployment-oriented evaluation and decision-support workflow for translating dropout risk models into actionable capacity planning.
Article Metrics
Abstract: 9 Viewers PDF: 3 ViewersKeywords
Full Text:
PDFRefbacks
- There are currently no refbacks.

Journal of Applied Data Sciences
| ISSN | : | 2723-6471 (Online) |
| Publisher | : | Bright Publisher |
| Website | : | http://bright-journal.org/JADS |
| : | taqwa@amikompurwokerto.ac.id (principal contact) | |
| support@bright-journal.org (technical issues) |
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0




.png)