Enhancing SMOTE-ENN Efficacy on Imbalanced Datasets Using Decision Tree Leaf Feature Extraction: A Case Study on Student Employability Data

Rizkysari Meimaharani, Widowati Widowati, Ahmad Abdul Chamid

Abstract


This study looks at the challenge of classifying tabular data that is highly imbalanced and overlapping, where standard predictive models often lose performance and tend to focus too much on the majority class. Another problem is that many advanced ensemble models are highly complex and lack transparency. These models are often viewed as black boxes, making it difficult for users to clearly and explain how each feature contributes to the final prediction result.This study offers a hybrid classification approach to address the problem, by combining rule extraction from decision tree leaves, SMOTE-ENN resampling technique, and XGBoost algorithm to improve prediction performance more accurately and reliably.The leaf extraction process helps reorganize the data by separating overlapping class regions into clearer and more structured groups before synthetic samples are generated. The test results show that the proposed approach is able to exceed the performance of the baseline model, by obtaining an F1-score of 0.8554 which indicates increased effectiveness and balance in prediction. In addition to improving performance, this method also keeps the model interpretable. Instead of relying only on abstract engineered features, the model allows us to trace important features back to the original decision tree rules. This approach helps explain the prediction formation process more transparently, so that each model decision can be understood clearly, logically, and easily interpreted. Overall, the combination of Decision Tree, SMOTE-ENN, and XGBoost is effective in handling extreme class imbalance, while producing a clear, stable, and easy-to-understand model, making it more reliable and trustworthy in various real-world applications.


Article Metrics

Abstract: 10 Viewers PDF: 4 Viewers

Keywords


Imbalanced Data, Feature Extraction, Decision Tree, SMOTE-ENN, XGBoost

Full Text:

PDF


Refbacks

  • There are currently no refbacks.



Barcode

Journal of Applied Data Sciences

ISSN : 2723-6471 (Online)
Publisher : Bright Publisher
Website : http://bright-journal.org/JADS
Email : taqwa@amikompurwokerto.ac.id (principal contact)
    support@bright-journal.org (technical issues)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0