Data Processing and Optimization in the Development of Machine Learning Systems: Detailed Requirements Analysis, Model Architecture, and Anti-Data Drift Strategies

Nataliya Boyko

Abstract


The research relevance is determined by the growing need to use machine learning systems in various industries, which requires reliable data processing and optimization. The study aims to develop a machine learning system for data processing and optimization, that predicts employee departure based on internal company data, analyze the subject area and existing approaches, define model architecture and describe the developed system, validate the application’s performance on test data, and develop strategies to counteract data drift. To achieve this goal, the applied methods are machine learning algorithms, including, decision tree algorithm, logistic regression, neural networks, and architectural approaches used in machine learning systems with low input data information. This study employs multi-generation model architectures, ensemble methods with LightGBM for robust prediction, and dynamic adaptation strategies to handle feature and data drift. The main results of the study are a machine learning and data pre-processing system for recognizing the risk of employee dismissal, which can serve as a basis for implementing similar services in IT companies. The object of the study is the system of predicting the probability of a particular employee’s dismissal within a certain period. It also demonstrates how to cope with all the difficulties of developing a solution based on data of low information content and poor quality. The implemented application, despite the quality of the data and the high imbalance of classes, produces valuable results for the business. The practical significance of this study lies in the possibility of using the developed system to predict and prevent employee losses, which contributes to increasing team stability and improving the efficiency of personnel management, as well as increasing the competitiveness of enterprises.

Article Metrics

Abstract: 50 Viewers PDF: 39 Viewers

Keywords


Information Transformation; Resource Efficiency; Time Series; Operationalization; Pre-Processing

Full Text:

PDF


Refbacks

  • There are currently no refbacks.



Barcode

Journal of Applied Data Sciences

ISSN : 2723-6471 (Online)
Organized by : Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia.
Website : http://bright-journal.org/JADS
Email : taqwa@amikompurwokerto.ac.id (principal contact)
    support@bright-journal.org (technical issues)

 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0