Enhancing Federated Learning Performance through Adaptive Client Optimization with Hyperparameter Tuning

The effectiveness of Industrial Internet of Things (IIoT) systems requires a robust fault detection mechanism, a task effectively accomplished by leveraging Artificial Intelligence (AI). However, the current centralized learning approach proves inadequate. In response to this limitation, Federated Learning (FL) enables decentralized training, ensuring the protection of individual data. The traditional FL settings are not sufficient to provide an effective learning process, which needs to be refined. This paper introduces an Adaptive Distributed Client Training (ADCT) mechanism designed to optimize performance for each FL participant, thereby establishing an efficient and resilient system. The proposed ADCT utilizes two parameters, namely the accuracy threshold and grid search step, to find the optimal hyperparameter for each client in a specific number of federation rounds. The evaluation results, conducted using the MNIST and FMNIST datasets in non-IID settings, indicate that the proposed ADCT enhances the F1-score by up to 37.13% compared to state-of-the-art methods.


Introduction
In recent decades, the progress of AI has brought about numerous advantages, enhancing the quality of services across various domains [1], [2].AI expedites decision-making by inputting information into the machine in a format consistent with its training.Currently, the predominant approach involves centralized training, where all information resides on a powerful single computer, and training is conducted locally.While potent, this approach has significant drawbacks, including data leakage and susceptibility to a single point of failure, which can lead to a system malfunction.
To overcome these issues, FL has emerged as a decentralized training method, aiming to rectify the limitations of centralized learning [3].FL enables distributed training on client devices, ensuring privacy and robust model creation without exposing client data.The effectiveness of FL methodologies heavily depends on the careful selection of suitable clients to enable swift model convergence [4].However, FL must consider factors such as device diversity, security, and data distribution to maximize efficiency.Techniques like client selection and local training optimization prove valuable in enhancing FL performance.
Referring to one of the popular technologies in recent years, the metaverse, where users engage with diverse scenarios and environments [5], FL's capability to distribute learning across different devices without consolidating sensitive data becomes especially advantageous.The creation of personalized avatars, reflecting users' preferences and behaviors, demands advanced AI models capable of adjusting to individualized training requirements.FL's ability to customize training on the client side plays a crucial role in developing more precise and responsive avatars, ultimately enriching the overall user experience [6].
The local training optimization is a crucial aspect of improving the FL process.Configuring each local server properly leads to enhanced global performance.Various optimization methods have been explored, including Two-Stage Federated Optimization Algorithm (TSFOA), which dynamically assigns model weights based on data distribution among all FL participants [7].Additionally, a technique called AMBLE, which adaptively adjusts mini-batch size and local epochs, has been introduced to enhance FL performance, especially in non-Independent and Identically Distributed (non-IID) settings [8].
Recognizing the significance of local training optimization, it is essential to emphasize that incorrect configurations can result in longer training times and increased computing resource demands.To address and reduce processing time, our focus in this article is on considering only two parameters, outlining the primary contribution of our work as follows: 1) We suggest the implementation of ADCT as a means to customize training for individual clients, thereby elevating their performance.ADCT utilizes hyperparameter tuning optimization to enhance the local model performance and, consequently, the overall performance of the FL model.
2) An ablation study of the proposed ADCT is conducted to obtain the best configuration, resulting in a more efficient learning process.
3) We extensively assess the performance of FL using the MNIST and FMNIST dataset in a non-IID configuration to showcase the effectiveness of the proposed method in comparison to state-of-the-art FL methods.
The rest of this research is structured as follows: In Section II, prior studies on FL optimization, specifically local training, are reviewed.Section III outlines the proposed system model in detail.The performance evaluation of the proposed model, along with its counterpart, is presented in Section IV.Finally, Section V concludes this work and provides some future directions.

Related Works
FL optimization can be divided into several aspect, such as client selection [9], local optimization, as well as the security enhancement.The local training process on each selected client is a crucial factor in ensuring the performance of FL.Considering participant device heterogeneity, computing resources, and dataset distribution vary widely, especially in non-IID scenarios.Furthermore, the phenomenon of client drift, where local model updates tend towards a local optimal solution, can negatively impact the performance of the global model.
Addressing client drift issues involves employing model aggregation and model training methods [10].FedAvgM [11] introduces momentum on top of Stochastic Gradient Descent (SGD) for model aggregation, a concept similarly utilized in FedMIM [12].FedMIM incorporates weighted global gradient estimation, contributing to global objective optimization across all FL participants.Another approach, FedNova, normalizes and scales local updates based on the number of local epochs before aggregation [13].All four systems: FedAvgM, FedMIM, FedNova, and adaptive bias estimation [14] demonstrate faster convergence compared to the traditional FedAvg.
In terms of model training, the TSFOA method [7] involves adaptively assigning model weights by considering differences in data distribution among FL participants.Results with the non-IID MNIST dataset indicate that TSFOA outperforms FedAvg in convergence time.Adaptive optimization methods for server updates, introduced in AdaGrad [15], are complemented by local optimization methods in [16], demonstrating faster convergence with local adaptivity.
Local hyperparameter optimization in FL includes dynamic learning rates to adapt to wireless computation environments, leading to higher accuracy with MNIST and CIFAR datasets.Techniques such as Adaptive FL Dropout (AFD) [17] and FedDUAP [18], involving dynamic updates and adaptive pruning, minimize communication costs and enhance efficiency.AMBLE [8] dynamically adjusts mini-batch size, local epochs, and learning rate, achieving faster convergence in both IID and non-IID settings.
Furthermore, an algorithm in [19]  and delivering an optimal model for the overall FL process.

Proposed System
In this section, the proposed ADCT process is detailed, presenting both the workflow diagram and pseudocode.Subsequently, we discuss the simulation details and evaluation metrics employed to compare the proposed work.

Adaptive Distributed Client Training
ADCT is employed for the dynamic optimization of the training process on individual FL clients.The parameters introduced by ADCT (γ, s) consist of γ for the accuracy threshold and s for the grid search step.Together, these parameters regulate the scope of the grid search process every federation round.The workflow of the proposed ADCT is detailed in figure 1.Initially, similar to traditional FL, client selection is performed to choose an appropriate participant for the federation round.The grid search is then conducted based on two parameters in ADCT, γ and s, as detailed previously.If the conditions are met, the grid search is executed; otherwise, the previous hyperparameter configuration is retained.To better understand the concept of the proposed ADCT, the pseudocode is detailed in figure 2.Moreover, it is worth noting that the grid search method stands out as a widely embraced technique for hyperparameter optimization in AI models.Frequently employed, this method seeks the most effective combination of hyperparameters, aiming to achieve optimal performance for a given system.In this work, we consider two hyperparameters as the main goals, namely the learning rate η and local epochs   .
In total, six different configurations of hyperparameters can be adopted by each FL participant to deliver better learning performance.The comprehensive details of the hyperparameter configuration are provided in table 1.

Simulation Details
The Flower framework [20] is employed to establish the FL environment.Flower allows for the creation of customizable and adaptable FL systems tailored to specific use cases and optimization requirements.Developed using the Python programming language, Flower can be tailored to align with the architecture proposed in this research.
Regarding available modules and aggregation strategies, Flower predominantly offers FedAvg as its principal aggregation technique.
In this work, we explore two types of datasets with different task complexities.First, we utilized MNIST, which represents handwritten digits with 10 classes.The next dataset is FMNIST, representing fashion image classification, also with 10 classes.Both datasets used in this work are popular for the FL system evaluation process.
To mimic real-world application use cases, the datasets are divided into non-IID settings.Each client is configured to have only one class of data, resulting in a prolonged learning process.The illustration of the data allocation for the MNIST and FMNIST dataset is shown in figure 3 and figure 4, respectively.

Result and Discussion
In this section, the evaluation of the proposed ADCT is conducted.The performance is compared with state-of-the-art methods such as FedAvg [3], FedMedian [21], FedACS [9], FedYogi [22], and FedOpt [22].Firstly, the final model accuracy at the end of the federation process is investigated.Subsequently, the total rounds required to achieve the target accuracy are also examined.

Model Accuracy
For the initial evaluation conducted using the MNIST dataset via an ablation study, the accuracy threshold and grid search step were varied.The results are shown in figure 5.Over a total of 50 communication rounds and nine variations, the highest performance was achieved by the ADCT configuration with an accuracy threshold of 50 and a grid search step of 5. Subsequently, similar performance was obtained using ADCT (25,15) and ADCT (50,15).

Figure 5. Ablation study of the proposed ADCT using MNIST dataset
For the subsequent dataset used in the evaluation, we employed FMNIST.This dataset is more complex compared to the MNIST dataset.Similarly, the evaluation was conducted using nine variations of ADCT, with variations in the accuracy threshold and grid search steps.The results depicted in Figure 6 indicate that ADCT with an accuracy ISSN 2723-6471 752 threshold of 75 and a grid search step of 15 provides the best performance in terms of accuracy.This is followed by ADCT (75,5) and ADCT (50,10) in the second and third place, respectively.

Figure 6.
Ablation study of the proposed ADCT using FMNIST dataset.
To better comprehend the performance of all variations of ADCT evaluated in this study, table 2 presents the accuracy of each setting in two different datasets.Similarly, the performance depicted in the line graphs (figure 5 and figure 6) is aligned.The results show that the accuracy for MNIST and FMNIST is 78.75% and 69.35%, respectively.

Round to Achieve Target Accuracy
To validate the previously mentioned evaluation metrics, the calculation of the rounds required to achieve the target accuracy is also conducted during the evaluation.In this study, a total of 50 rounds are considered for both the MNIST and FMNIST datasets.Achieving the target accuracy with a lower number of rounds is crucial to enhance the efficiency of the FL system.Table 3 displays the total number of rounds required by each variation of the proposed ADCT to achieve the target accuracy.For MNIST, the target is set to 75%, while for FMNIST, it is configured to 60% due to the larger complexity of the dataset.Moreover, the results show that the findings from the previous section align with the total rounds needed to achieve the target accuracy.This is evidenced by the optimal setting for ADCT, which only takes 31 rounds to achieve the target accuracy in the MNIST dataset with an accuracy threshold of 50 and a grid search step of 5. Similarly, the most efficient configuration for the FMNIST dataset is using ADCT (75,15), which only requires 16 federation rounds.

Comparative Analysis
To demonstrate the improvement of the proposed ADCT compared to its counterparts, the F1-score of various FL techniques with similar learning configurations was assessed.In total, six different state-of-the-art methods, such as FedAvg, FedACS, FedMedian, FedYogi, and FedOpt, are considered for the comparative analysis along with the proposed ADCT.The performance of target accuracy under two datasets depicted in table 4. It is worth noting that the configuration for MNIST and FMNIST differs in the proposed ADCT.This variation is attributed to the differences in dataset difficulty and complexity.Based on the ablation study detailed earlier, the optimal configuration of ADCT used in the comparative analysis is outlined in Table 4.For the MNIST dataset, ADCT (50, 5) is utilized, while for FMNIST, the configuration of ADCT (75,15) is applied.The detailed comparative performance evaluation using MNIST and FMNIST datasets in terms of F1-score is presented in table 5.As detailed earlier, the performance evaluation is conducted using non-IID settings with the same distribution for each FL technique.For the MNIST dataset, the proposed ADCT achieved the highest performance with an F1-score of 70.62%.This value is 2.39% to 34.88% better compared to state-of-the-art techniques.Similarly, for the FMNIST dataset, the best performance was attained by the proposed ADCT, yielding a total F1-score of 58.46% at the end of the learning process.In summary, the proposed ADCT exhibited a performance improvement ranging from 3.48% to 37.13% for the FMNIST dataset when compared to its counterparts.

Figure 1 .
Figure 1.Workflow of the proposed ADCT inserted in the traditional FL system

Figure 2 .
Figure 2. The overall algorithm of the proposed ADCT.

Figure 3 .
Figure 3. Non-IID settings for multiple clients based on the MNIST dataset

Figure 4 .
Figure 4. Non-IID settings for multiple clients based on the FMNIST dataset considers client heterogeneity factors (computing power, storage, and network) to enhance local convergence, defining criteria to stop local training once met.Results show the superiority of this While numerous researchers aim to enhance the performance of all FL participants, this pursuit can result in increased computing costs, given the need for multiple iterations of adaptive local training.Presently, the incorporation of local client optimization often leads to prolonged training times due to the iterative nature of local training.Therefore, it becomes imperative to decrease the total number of iterations conducted, striking a balance between minimizing costs optimization technique over the baseline FL.State-of-the-art studies emphasize the importance of hyperparameter settings for individual FL clients in enhancing system performance.However, excessive tuning may increase computing resource consumption, necessitating the establishment of appropriate criteria for hyperparameter configurations, considering overall computing costs.Vol. 5, No. 2, May 2024, pp.

Table 2 .
Accuracy comparison among various accuracy threshold and grid search steps in the proposed ADCT

Table 3 .
Round to achieve specific target accuracy comparison among various accuracy threshold and grid search steps in the proposed ADCT

Table 4 .
Round to achieve specific target accuracy comparison among various accuracy threshold and grid search steps in the proposed ADCT

Table 5 .
Comparative performance evaluation of the proposed ADCT and the state-of-the-art FL techniques, assessed using three different datasets under non-IID settings, in terms of F1-score