Data mining for Education Sector, a proposed concept

Data mining is very much needed in various fields. Accessing a large amount of data requires time and a high level of accuracy. In higher education the potential influence of data mining on the learning processes and outcomes of the students was realized. Especially in the field of education, knowing almost every educational institute, both public and private, has thousands of data from students with a variety of different programs and subjects. Understanding the benefits of data retrieval will facilitate the course of education itself. The use of Data mining in education will be useful in developing a student-focused strategy and in providing the correct tools that institutions would be able to use for quality improvement purposes. In this paper, we will find out the benefits of applying data mining in the education sector using classification, prediction, association and clustering methods.


Introduction
Data mining techniques are increasingly gaining significance in the education sector. Data mining is arguably a technology that has the potential to help the performance of a particular field in education which tends to be more focused on storing important information digitally. Storing their data into a database will provide security guarantees for their own information which may be more vulnerable if it is in the form of physical data. Data mining can also make an identification, classification, and even prediction of the potential of the data future itself making an institution able to act proactively and make a decision. In the education sector, EDM or educational data mining are specific areas that are used to represent the use and the application of data mining itself. In order to improve the educational process they create a system that can continuously collect, process, report and work on digital data. Data mining technology is able to answer various questions in the field of education which generally takes a long time to be resolved [1].
The EDM process transforms raw information from educational systems into useful information that may effectively have such a massive impact on study and research in the education sector. With data mining, it is possible to search for predictive information that experts may mistake due to various factors beyond their expectations. Data mining is able to predict the level of probability of graduation or failure of a student, especially in universities. Various institutes can use information from data mining itself to focus and concentrate on improving the performance of students who have the most potential to fail. However, the application of data mining in education is still in its infancy stage and needs more attention. This paper identifies the use of data mining in the education sector. Through this paper each data mining process from various researchers will be explained and compared with each other. The main objective of this paper is to identify the benefits of data mining in the education sector, what can be done with the help of data mining, the most efficient and effective methods and techniques for applying it.
The Big Data phenomenon, started in the 2000s since Doug Laney, a compilation of industry analysts conveyed the concept of Big Data consisting of three important parts known as 3V [3]. Volume, Users collect various source data such as social media, business transactions, and machine or system information. Activities like this are difficult to do because of the lack of advanced technology in data collection, but with technology such as Hadoop, Spark, Google BigQuery, etc activities like this are no longer a problem. Velocity, The intensity of data entered that needs to be processed. Imagine how many messages & information, data updates, payment transactions that are generated by telecommunications operators every hour every day, it is like hundreds of thousands or even millions of data. Handling data flow quickly and precisely with hardware and software is a must. Variety, Data collected tends to have different formats or data types. Variety is one of the most fascinating technological developments, as more and more data is digitized. Structured data such as numeric data (date, cost, price), documents (text, fax), e-mail, audio, video, health records, payment transactions, etc.
Data mining has been used in various sectors that need to store a lot of important data such as banking, telecommunications, health, infrastructure, etc. However, its use in the education sector is still not implemented. This is because the education sector has many human factors that can make the results of systems such as data mining not entirely accurate. Following is the use of data mining in education based on [4] research: Identifying students' needs and choices for the subjects they are interested in and according to their expertise, Identifying student trend patterns, Predicting knowledge, rank, and final grade, Making it easy to find data automatically, Making it easy to build profiles of students, Helping the management to understand the business.
Many writers from around the world develop different procedures about data mining. Data mining [5] is "The process of selecting, exploring, and modeling large amounts of data to look for regulations or relationships that were initially unknown with the aim of obtaining clear and useful results for database owners." Table 1 represents the process of data mining, which was described by various authors and also helped us to compare each process with each other. At this stage, the first thing to do is determine business goals, assess current situation, determine data mining targets and establish project plans while staying focused on the business goals.

2) Understanding data
Right after determining business goals, the next step to do is collecting data and describing it. It is very important to collect data from various sources and describe it. Then, check the data quality to find out whether there is a problem or not. During this stage, data that has been collected need to be cleaned from data that is not needed. Data that has been cleaned will be sent to the Modeling and building new data stage.

4) Modeling
At this stage, data that has been prepared will be evaluated, so that it can be used to design the test.

5) Evaluations
After determining tools and techniques, results of the modeling process will be evaluated and analyzed according to business objectives. The impact from the model is needed to understand and review data before it will be disseminated.

6) Dissemination
At this last stage, reports and graphs will be produced and then monitored, after which decisions will be created based on the data graphs and report.

Research Method
To be able to know how data mining can help education performance, it is very important to understand the basic concepts of data mining itself. In data mining there are 4 important methods namely: Classification, Categorization, Estimation, and Visualization [10]. Classification is used for identifying class differences, associations, and separating subjects for each student. Categorization has the role to handle the results of algorithmic induction categories such as "Survive" or "Expelled", and "Move" or "Stay". Estimates include predictive functions and relate to continuous outcome variables, such as GPA or value and salary levels. Visualization uses interactive graphics to demonstrate rules and values mathematically, and is much more modern than bar charts or pie charts.
Higher education institutions can use classification methods, to analyze student characteristics, or use estimates to predict the likelihood of various outcomes such as perseverance, performance in an area and the level of graduation of a course. There are many methods in data mining that are used in various industries. Predictions, Classifications, Associations and Clustering are the most common in the education sector.

Classification
Classification [11] is a process for finding a set of models that describe and distinguish the class or concept of a data. The final result can be represented in various forms such as classification rules, decision trees, or neural networks. The model can then be used to predict the class label of the data. In most applications, it is still necessary to re-predict some of the missing data compared to the class label. [12] stated that by predicting accurately the overall grades of the students in a specific course, classification techniques can help to improve the efficiency of the higher education system. This technique involves: Analyzing student interest in the learning process, Identifying under-motivated students, Interaction between students and the learning activities, Evaluating if a student has completed an assignment, Continuing evaluate learning performance of students, Examining levels of participation to avoid students dropout from e-learning courses.
In conclusion, the classification of data mining can be applied to classify students based on their achievement, knowledge, grades and students who are less motivated. Classification is used to increase the quality and efficiency of the learning activity and also provide several guidance for the higher education system, while enhancing the process of decision-making. It can be proved that classification would grant decision makers more flexibility to evaluate a group of students performance and behavior in order to identify how specific group members are capable to complete well in a learning process even if their particular knowledge or skills do not fit in with the task. These techniques can be used effectively to provide early support in the form of educational assistance, in particular to encourage students who are expected to perform unsuccessfully in a particular activity or class, and to effectively calculate the pro and con responses that make up the efficiency of a classification pattern.

Predictions
Prediction has aim to build a model that can deduce a single aspect (predicted variable) from several other aspects in a data (predictor variable). Prediction requires having a label for the output variable for a limited set of data, which label represents some reliable "Basic Trust" information about the value of the output variable in a particular case. Few of the main uses of higher education prediction involve predicting behavior, performance, knowledge, skill, and grades among students. However, in some cases it is important to reconsider these labels whether they are estimates or not fully trustworthy.
Prediction can be used to learn what model features are important for predictions, so that it can provide information on the underlying structure. The general method in the research program is able to predict the educational outcomes of students without first predicting intermediate factors. This can be assumed that prediction, much the same as the classification techniques, can be used effectively for prediction purposes. However in a classification, the predictor is the categorical task while it is a numerical value or a continuous task in prediction. For this reason, researchers mostly use many predictions techniques to predict the academic performance of students and to identify variables that might predict success or failure in university courses.

Association
Association method in data mining is to find a collection of variables in a database repeatedly. Association analysis is the discovery of association rules that indicate the condition of attribute values that often occur together repeatedly in a particular data set. The association method is 'find if-then rules' which means that if the value of one variable is found, the value of another variable will have a certain value [13]. This is because it help teachers to more efficiently evaluate learning patterns for students and coordinate the course content. It could also be used to provide feedback to support decision making by teachers, suggest learning content based on student access history, encourage collaborative learning, recognize irregular learning patterns, evaluate student performance and predict student grades. For example, students who choose networking can also choose computers as a specialization. In addition, students who choose business courses will also choose the MBA program.
In short, the association method can be used to open a new tertiary institution, promote new courses and specializations based on some rules. Association is used to define interactions between the behaviors, learning materials and performance disparity characteristics of students.

Clustering
It means grouping similar objects or clustering. Rajshree et al described clustering as the process of grouping a series of physical or abstract objects into the same object class. The grouping is not to predict, classify, or estimate target variables number but segment entire data inside a homogeneous subgroup [14]. Furthermore, the task of grouping in the education sector especially universities are based on registration, student transfer, re-admission, selected course, gender, specialization and student behavior. Clustering in higher education is mainly used to support the interaction of students in different learning situations, to recommend similar users' activities and resources, to examine student performance and participation in the education process, to find groups of students with similar learning characteristics based on their behavior knowledge and skills. These activities may help educational decision-makers recognize possible dropouts at an initial point and resolve the issue of allocating new students to courses that are not of interest to them.
In conclusion, to evaluate student performance, clustering can help teachers to identify unwanted student behaviors, predict the learning outcome of the student, and assist teachers in collaborative student modelling by evaluating the collective interaction between students. This technique is also used to help students recognize the process of collaborative inquiry among students, develop different science skills and learn popular learning routes. Higher education clustering is still considered an effective technique for grouping students based on their learning personality, preferences for different learning styles, behavioral interaction, and academic achievement. It can also be used to analyze collaborative learning techniques and to improve the retention rate which would help institutions to classify students at risk quickly.

Data Mining Techniques
There are many techniques discussed in order to understand data mining. [15] indicate 7 types of data mining models: Association, Classification, Clustering, Forecasting, Regression, Sequence Discovery and Visualization. Data mining techniques are used to obtain useful or important information in a particular sector. Of the many data mining techniques in the education sector, there are two very important techniques: decision tree and neural network.

Figure. 1. Decision tree concept
Decision trees illustrated on figure 1 above, are data mining techniques that can be used to classify and predict big data. Decision tree is used to create a customer profile. In addition, decision trees make classifications that are easy to understand and also results-oriented. Many sectors use it to predict and classify customer behavior, release and retention. Likewise, the education sector can also use this technique to predict or classify student performance and behavior.

Figure. 2. Neural network concept
Neural network Figure 2, also known as a distributed parallel processing network, is a computing paradigm that is broadly styled on neuronal brain structures. It consists of interconnected processing elements, called nodes or neurons, which work together to create a function of output. This is a technique that can be used to classify large complex data. It is usually used to learn student selection of courses, determine student satisfaction on course and their grades, and decide their specialization selection. Data input is also represented by a neurons that connected to prototype neurons. Each of these connections has a weight that learned the ability to adapt while learning.

Data Mining Methods in the Education Sector
There are many ways to apply data mining in education. But some of them have very helpful advantages, such as:

Prediction of Student Registration
Aksenova et al [16] developed a predictive model for new, current and returning students at the undergraduate and graduate level. This model builds on the region's population, the unemployment rate in the region, the tuition fees of an institution, household income, and the recording of the institution's past registration data. Data is mined or obtained by help of Cubist tools. The conclusion is data mining is applied and has a very big impact on higher education.

Curriculum Development
According to [17] predicting completion rates, study program preferences, and professional registrants used data mining algorithms such as decision trees, decision forests and link analysis.The institution can find correlations between course categories and applicant's profession or occupation. They stressed on how important data mining in developing marketing and curriculum in the higher education. This helps an institute improve the quality of their registrants and meet the institute's business targets themselves.

Subject Completion
In order to understand student patterns towards completing courses, University can classify student into group based on loyalty, complaint levels, and satisfaction on their course.

Student Targeting
Woo et al. [18] Define student targeting as "the process of building strategies against specific students." They stated the customer map is a visualization method to target students. Customer maps help in building studentoriented strategies. This is a "new technique for finding the right target student with values, Character, and their needs." This is based on three dimensions of customer targeting: Customer needs (complaints and satisfaction), Customer Values (usage and behavior), Customer characteristics (demographic and psychographic).

Student Course Selection
Factors such as student workload, student characteristics, grades, type of course, course duration, final exam and student needs can influence student choice in subjects using neural networks [19]. These factors act as input for neural network modeling.

Teacher's performance in teaching
The instructor's attitude, employee status, student attendance, and student feedback are factors that can influence the teacher or instructor teaching performance at university by using stepwise regression and decision trees in data mining techniques [20]. 8 the data mining process can also be used for classify teachers performance which helps in improving education system.

Discussion
The benefit acquired through various data mining methods could allow learning institutions to create smart decisions, give better advanced planning to support students, predict potential patterns and behaviors with greater precision, and allow the institution to more accurately distribute resources and staff. We believe that the use of data mining in education can play an important role in improving student learning experience and grades, finding patterns and predicting student academic achievements and behaviors.
Based on the methods we have discussed namely: Classifications, Predictions, Associations and Clustering. each method has its own role and usability. Classification has a role to play in helping the teacher classify students based on achievement, grades, or even students who are less motivated. Prediction methods can be used to predict information, data, or values. in general, researchers use it to predict student passing rates on a course or predict students' final grades from a particular course. Associations are used to identify relationships between two variables, for example a student who takes a computer course will also choose a programming course. Many researchers use this method to identify abnormal learning patterns and evaluate student performance. And finally, the Clustering method is useful for grouping students with specific patterns. This is useful for identifying unwanted behaviors or predicting the learning outcomes of these groups of students.

Conclusion
Data Mining application inside education sector especially at institutes such as universities can be said to be very helpful especially in predicting, classifying characteristics and preparing strategies for improving student performance. Similarly, universities can also apply data mining to predict student enrollment in various courses. With techniques such as the Decision Tree we can predict students' Course Outcomes based on the attributes taken. Decision tree classifiers are used in student data to predict student performance in class. These techniques will help in identifying students who are under attendance and identifying poor performance. The main finding using this technology is the collection of knowledge from students' academic performance.