Early Detection of Female Type-2 Diabetes using Machine Learning and Oversampling Techniques

Lana Al-Dabbas, Ahmad Adel Abu-Shareha

Abstract


Early diabetes prediction is crucial as it can save numerous lives and prevent diabetes-related complications. The experiments conducted on diabetes prediction are keen on the limited samples of diabetes and non-diabetes cases provided in the available dataset. Various techniques have been implemented, focusing on the classification technique to improve the accuracy of prediction results. As a significant technique, oversampling has been implemented using SMOTE, which improved the results yet posed limitations due to its naïve technique. In this paper, a framework for diabetes prediction is developed, integrating an advanced oversampling technique using SVMSMOTE with various machine-learning algorithms to achieve the best performance. The proposed framework aims to overcome the problem of inaccurate data and limited samples using preprocessing and oversampling techniques. Besides, these techniques are integrated with other data mining and machine learning algorithms to improve the performance of diabetes prediction. The framework consists of four main stages: data exploration, data preprocessing, data oversampling, and classification. The experiments were conducted on the Pima Indian diabetes dataset, which comprises 768 samples and 9 columns. The results showed that the proposed framework achieved an accuracy of 91%, which improved the accuracy compared to using classification without oversampling, which achieved an accuracy of 90%. In comparison, the best results addressed in the literature were an accuracy of 85.5%. As such, the proposed framework improves the results by approximately 6.4% compared to the existing frameworks. Besides, the proposed framework achieved the best f-measure using the XGBoost classifier and SVMSMOTE, equal to 0.879. The best recall was achieved using RF and SVMSMOTE, which was 0.931. Finally, the best precision was achieved using FR without oversampling, with a value of 0.918.


Article Metrics

Abstract: 32 Viewers PDF: 13 Viewers

Full Text:

PDF


Refbacks

  • There are currently no refbacks.



Barcode

Journal of Applied Data Sciences

ISSN : 2723-6471 (Online)
Organized by : Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia.
Website : http://bright-journal.org/JADS
Email : taqwa@amikompurwokerto.ac.id (principal contact)
    support@bright-journal.org (technical issues)

 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0