Enhancing SMOTE Using Euclidean Weighting for Imbalanced Classification Dataset
Abstract
Class imbalance is a significant challenge in machine learning classification tasks because it often causes models to be biased toward the majority class, resulting in poor detection of minority classes. This study proposes a novel enhancement to the Synthetic Minority Over-sampling Technique (SMOTE) by incorporating Euclidean distance-based feature weighting, called Weighted SMOTE. The key idea is to improve the quality of synthetic minority samples by calculating feature importance using a Random Forest model and assigning higher weights to the most relevant features. The objective of this research is to generate more representative synthetic data, reduce model bias, and increase predictive accuracy on highly imbalanced datasets. Experiments were conducted on four benchmark datasets from the KEEL Repository with imbalance ratios ranging from 0.013 to 0.081. The proposed Weighted SMOTE combined with an ensemble voting classifier (Random Forest, AdaBoost, and XGBoost) demonstrated significant improvements compared to standard SMOTE and models without resampling. For example, on the Zoo-3 dataset, the Balanced Accuracy Score (BAS) increased from 75% to 90%, while the F1-score improved from 48% to 94%. On the Cleveland-0_vs_4 dataset, precision improved from 83% to 91% and recall remained high at 99%. Statistical testing using the Wilcoxon signed-rank test confirmed these improvements with p-values < 0.05 for key metrics. The findings show that the proposed method effectively balances sensitivity and precision, generates more meaningful synthetic samples, and reduces the risk of overfitting compared to conventional oversampling. The novelty of this work lies in integrating Euclidean-based feature weighting into the SMOTE process and validating its performance on multiple domains with varying feature types and imbalance ratios. These results indicate that the proposed Weighted SMOTE approach contributes a practical solution for improving classification performance and model stability on severely imbalanced data.
Article Metrics
Abstract: 11 Viewers PDF: 10 ViewersKeywords
Full Text:
PDFRefbacks
- There are currently no refbacks.
Journal of Applied Data Sciences
ISSN | : | 2723-6471 (Online) |
Organized by | : | Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia. |
Website | : | http://bright-journal.org/JADS |
: | taqwa@amikompurwokerto.ac.id (principal contact) | |
support@bright-journal.org (technical issues) |
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0