Multi-Algorithm to Measure the Accuracy Level of Diabetes Status Prediction

Poor management of diabetes leads to damage in organs and body tissues, impacting crucial organs like the heart, kidneys, eyes, and nerves. Although there is no permanent cure for diabetes, early detection enables effective disease management, which researchers and medical professionals agree enhances recovery prospects. The rapid progress in information technology has facilitated early prediction and diagnosis of diseases through Machine Learning (ML), a subset of Artificial Intelligence (AI) comprising various algorithms such as Neural Network, Support Vector Machine (SVM), kNN, Random Forest, and Naïve Bayes. These algorithms serve as effective tools in handling predictive data. Early prediction of diabetes holds the potential to control the disease and save lives. Therefore, the focus of this research is to develop a predictive model for diabetes status by utilizing various algorithms, but the level of validation of this model still needs to be tested. The dataset utilized consists of information from several diabetic patients, including eight input variables (pregnancies, glucose levels, blood pressure, skin thickness, insulin levels, BMI, age, and diabetes pedigree function) and one output variable (diabetes status). Research findings indicate that the SVM algorithm exhibits superior accuracy (84%) in predicting diabetes status compared to other algorithms such as neural network


Introduction
There are several chronic diseases that need to be anticipated, one of them being diabetes.An increase in blood sugar or glucose levels beyond normal values is the main sign of diabetes.Diabetes occurs when the patient's body is no longer able to take in sugar or glucose into cells for energy.As a result, this condition can lead to the accumulation of extra sugar in the blood [1], [2], [3].
In the 10th edition atlas by the International Diabetes Federation (IDF), it is explained that diabetes is among the global health emergencies, making it the fastest-growing disease.Worldwide, there are currently approximately 537 million people living with diabetes.In 2023, the number is projected to increase to 643 million, and by 2045, it is estimated that around 783 million people will be living with diabetes.Diabetes itself can lead to the death of approximately 6.7 million adults aged between 20 and 79 years old [4].
Poorly controlled diabetes can lead to damage to various organs and tissues in the body, resulting in serious consequences.Among these organs are the heart, kidneys, eyes, and nerves.Diabetes does not have a long-term cure, but if detected early, the disease can be managed.Researchers and medical professionals agree that early detection of diabetes will improve the prospects for recovery [5].
Currently, the progress of information technology is advancing rapidly, and Machine Learning (ML) can be employed for predicting and diagnosing diseases early [6].As part of Artificial Intelligence (AI), it comprises several algorithms, including Neural Network, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest, and Naïve Bayes.These algorithms can be utilized as approaches in managing prediction data [7], but will measure the level of accuracy in predicting diabetes status.From several previous studies, neural networks algorithms have a better level of prediction accuracy than other algorithms.Diabetes can be controlled, and lives can be saved through early disease prediction.To achieve this, the focus of this research is to create a predictive model for diabetes status that can serve as an early reference in decision-making for future diabetes management, using multi-algorithms as predictive algorithms.The data used consists of a diabetes dataset derived from testing several patient data with diabetes.The involved variables include 8 input variables: pregnancies, glucose, blood pressure, skin thickness, insulin, BMI, age, diabetes pedigree function, and one output variable: diabetes status.
The remainder of this research is divided into several sections such as: Literature Review are presented in Section 2, the research methodology and flow are shown in Section 3. Results and Discussion are presented in Section 4. In Section 5 the conclusions are presented.

State of The Art
This research will be developed experimentally, creating a model to measure the accuracy of diabetes status using several algorithms, including Neural Network, SVM, KNN, Random Forest, and Naïve Bayes.This model can be used for predicting diabetes status.There are several related studies on predicting diabetes, one of which was conducted by [7].In their research, they discussed the implementation of AI in predicting medical diagnoses for complicated diabetes, consisting of six diabetes complications: gestational diabetes, hypoglycemia in the hospital, diabetic retinopathy, diabetic foot ulcers, diabetic peripheral neuropathy, and diabetic nephropathy.Another study by [8] developed a predictive model related to the risk of diabetes.Another research on predicting diabetes status was conducted by [5] In this study, they identified the most relevant features in efficiently diabetes mellitus (DM) using machine learning techniques.The descriptions of previous research discussed predicting diabetes diagnoses, while the research to be conducted by the researcher is to measure the accuracy of diabetes status using algorithms such as Neural Network, SVM, KNN, Random Forest, and Naïve Bayes.
Describe the research methods and research techniques used.It should explain briefly but still in narrow focus, such as sizes, volume, replication, and technique.The new approach should be described in detail so that the other can reproduce the experiment, while the establishment methods can be explained by citing references [4], [5], [6] .From the several studies above, we do not use multiple algorithms like the one we will develop, because by using multi algorithms the hope is that they can provide valid predictions of diabetes status.And the variables that we will use consist of 8 variables.

Methodology
The following are the stages of developing a predictive model for diagnosing diabetes status by measuring the accuracy using algorithms such as Neural Network, SVM, KNN, Random Forest, and Naïve Bayes.The stages of developing a prediction model for the diagnosis of diabetes status are shown in figure 1. Figure 1 illustrates the stages of developing a predictive model for diagnosing diabetes status by measuring the accuracy of diabetes status.The first step involves determining the Diabetes Dataset which is shown in table 1, followed by the stages of forming a dataset, dividing the dataset into two parts, normalizing weight values and training data values which is shown in table 2, evaluating the dataset (back propagation), and the final step is to compare the results of diabetes to measure the accuracy of diabetes status using the algorithms Neural Network, SVM, KNN, Random Forest, and Naive Bayes.

Result and Discussion
The stages of developing a predictive model for diagnosing diabetes status by measuring the accuracy of diabetes status using the Neural Network, SVM, KNN, Random Forest, and Naive Bayes algorithms are as follows:

Problem Identification
This stage explains the identification of problems in research where a predictive model is needed to diagnose diabetes status by measuring the accuracy of diabetes predictions, for use in decision making or policy implementation.Then measure the accuracy of predicting diabetes status using Neural Network, SVM, KNN, Random Forest and Naïve Bayes algorithms.

Dataset Database
The following is the dataset that will be used in the process of measuring the accuracy of predicting diabetes status using the Neural Network, SVM, KNN, Random Forest, and Naive Bayes algorithms as shown in table 1.

Application of Neural Network Algorithm
Table 2 shows a sample dataset that has been normalized.The formula used to convert the original test data ranges from 0.1 to 0.9 because the activation function used is sigmoid with a value above 0.The Adam optimization algorithm neural network model is used to train each layer with parameter enhancements such as batch size 6, epoch=100, and validation_split=0.2,running across the entire dataset [9], [10], [11], [12].The SVM algorithm model with parameters C, kernel, degree, and gamma indicating 1, 'rbf', 3, and 'scale' [13], [14], [15], [16].
Meanwhile, the RF algorithm with parameters criterion, max_depth, n_estimators, and random_state indicating 'entropy', 5, 500, and 10 [17], [18], [19].Table 3 illustrates the performance of the neural network algorithm.Figures 2 and 3 present the training and testing results of the neural network algorithm.4 shows the results of SVM algorithm testing.The results of the Random Forest method are shown in the table above by considering the parameters criterion='entropy', max_depth=5, n_estimators=500, random_state=10 to get an accuracy result of 0.759.Meanwhile, table 6 shows the results of testing the Naïve Bayes Algorithm.The results of the Naive Bayes method are shown in the table above by considering the parameters alpha=1.0,force_alpha=False, fit_prior=False, class_prior=None to get an accuracy result of 0.649.Meanwhile, table 7 shows the results of testing the KNN Algorithm.The results of the KNN method are shown in the table above by considering the parameters n_neighbors=5, weights='uniform', metric='minkowski', metric_params=None, n_jobs=None to get an accuracy result of 0.714.

Design Widget Orange
Figure 4 is a figure of the orange widget design [20] to measure Receiver Operating Characteristic (ROC) analysis [21] on the results of diabetes diagnosis by SVM, neural network, Naive Bayes, kNN and manual algorithms.

Evaluation Confusion Matrix
Figure 5 shows the confusion matrix of the neural network algorithm.numbers are 16 and 30 out of the available 154 data.The table below shows the performance evaluation results of the neural network.Table 6 shows the results of the neural network algorithm evaluation performance and figure 6 shows the confusion matrix of the SVM algorithm.The output of the confusion matrix from the SVM algorithm is shown in the above figure.The number of nondiabetic classes successfully validated and misclassified are 88 and 30, while for the diabetic class, the numbers are 12 and 24 out of the available 154 data.The table below shows the performance evaluation results of the SVM.Table 7 shows the results of the SVM algorithm evaluation performance and figure 7 shows the confusion matrix of the RF algorithm.Forest.Table 8 shows the results of the RF algorithm evaluation performance and figure 8 shows the confusion matrix of the Naive Bayes algorithm.9 shows the results of the Naive Bayes algorithm evaluation performance and figure 9 shows the confusion matrix of the KNN algorithm.Figure 11 shows the evaluation of the accuracy of diabetes status using the Neural Networks algorithm with an accuracy rate of 0.8065/80%, the SVM algorithm with an accuracy rate of 0.84690/84%, the Random Forest algorithm with an accuracy rate of 0.759/75 %, the Naive Bayes algorithm with an accuracy rate of 0.649/65%, and the KNN with an accuracy rate of 0.714/71%.Table 10 shows the actual diabetes validation results by measuring the level of accuracy of diabetes status with the Neural Network, SVM, Random Forest, Naive Bayes, and KNN algorithms.Figure 11 shows the accuracy curve of Neural Network, SVM, Random Forest, Naive Bayes, and KNN algorithms on actual diabetes status.

Figure 1 .
Figure 1.Stages of developing a prediction model for diagnosing diabetes status by measuring the level of accuracy of diabetes status using algorithms including SVM, KNN, Random Forest, and Naive Bayes

Figure 2 .Figure 3 .
Figure 2. Results of loss training and validation of the neural network algorithm

Figure 4 .
Figure 4. Orange widget design for ROC analysis results of diabetes diagnosis using neural network algorithms, SVM, random forest, Naive Bayes, and KNN and manually.

Figure 5 .
Figure 5. Results of the confusion matrix neural network algorithm.The output of the confusion matrix[22] from the neural network algorithm is shown in the above figure.The number of non-diabetic classes successfully validated and misclassified are 84 and 24, while for the diabetic class, the

Figure 6 .
Figure 6.Results of the confusion matrix SVM algorithm

Figure 7 .
Figure 7. Results of the confusion matrix RF algorithm The output of the confusion matrix from the Random Forest algorithm is shown in the above figure.The number of non-diabetic classes successfully validated and misclassified are 89 and 26, while for the diabetic class, the numbers are 11 and 28 out of the available 154 data.The table below shows the performance evaluation results of the Random

Figure 8 .
Figure 8. Results of the confusion matrix Naive Bayes algorithmThe output of the confusion matrix from the Naive Bayes algorithm is shown in the above figure.The number of nondiabetic classes successfully validated and misclassified are 68 and 22, while for the diabetic class, the numbers are 32 and 32 out of the available 154 data.The table below shows the performance evaluation results of Naive Bayes.Table9shows the results of the Naive Bayes algorithm evaluation performance and figure9shows the confusion matrix of the KNN algorithm.

Figure 9 .
Figure 9. Results of the confusion matrix KNN algorithm The output of the confusion matrix from the KNN algorithm is shown in the above figure.The number of non-diabetic classes successfully validated and misclassified are 87 and 31, while for the diabetic class, the numbers are 13 and 23 out of the available 154 data.The table below shows the performance evaluation results of KNN.Table9shows the results of the KNN algorithm evaluation performance.

Figure 10
shows the performance curves of the Neural Network, SVM, Random Forest, Naive Bayes, and KNN algorithms with the target classes "Diabetes" and "Not Diabetes," represented by cyan, orange, blue, purple, and green lines.

Figure 10 .
Figure 10.Performance curves of the Neural Network, SVM, Random Forest, Naive Bayes, and KNN algorithms The results of the ROC curve above show that the true positive (Sensitivity) value is very dependent on the target value, if given a target value of 0 then the five algorithms experience an increase of 0.1, and if the target is given a value of 1 then the five algorithms experience a decrease of 0.1.4.8.Comparison of neural network, SVM, Random Forest, Naive Bayes, and KNN Algorithms on Actual Diabetes Status

Figure 11 .
Figure 11.Comparison results of the accuracy level of diabetes status between Neural Network, SVM, Random Forest, Naive Bayes, and KNN.

Table 3 .
Performance of the Neural Network Algorithm

Table 4 .
Performance of the SVM Algorithm The results of the SVM method are shown in the table above by considering kernel parameters on the RBF scale, C='1', degree=4, and gamma='scale', and obtained an accuracy of 0.84690.Meanwhile, table5shows the results of testing the Random Forest Algorithm.Vol.

Table 5 .
Performance of the RF Algorithm

Table 6 .
Performance evaluation results of the neural network algorithm with batch size 300, learning rate 0.3, momentum 0.2, training time 1000, and validation threshold 20.

Table 8 .
Performance evaluation results of the RF model algorithm

Table 9 .
Performance evaluation results of the KNN model algorithm is used to display, manage, and classify the performance of neural network, SVM, Random Forest, Naive Bayes, and KNN algorithms.