USING OF DATA MINING TECHNIQUES TO PREDICT OF STUDENT’S PERFORMANCE IN INDUSTRIAL INSTITUTE OF AL-DIWANIYAH, IRAQ

The aim of paper is to show the benefits of the educational data mining (EDM) techniques, in order to understand about of the factors which lead to technical student’s success and failure, and predict their performance and determine the individual learning ability in engineering sciences. For these goals, we use the individual data of 311 student and their grades that were collected in Industrial Institute of Al-Diwaniyah city (Iraq) during 2015–2017 academic years, in order to predict the results of final theoretical exam in industrial drawing by applying EDM techniques, such as association rules mining, classification with decision tree algorithm learning, clustering with Apriori algorithm and anomaly detection implemented as the output model of the clustering. Using Microsoft SQL Server Business Intelligence Development Studio 2012 platform and based on Cross Industry Standard Process for Data Mining, we prepare of 13 nominal and numerical attributes for each student and consequently apply and finally evaluate all 4 EDM techniques. We conclude that: 1) association rules were revealed that the most important factor which contribute to the failure of the student is the “project” attribute; 2) decision tree classification permit to the teacher predict the future students and to correct the student's prediction path, but 3) clustering collects of the students into successful and failure groups and helps to the teacher to guide each group separately, and 4) to detect anomaly by аn extension DMX for SQL and correct the education process for students located on the border of the cluster.


Methodology and Methods
There are four EDM techniques applied to achieve the purpose for this paper: 1. Association rules detection: to understand the most closely related features that lead to the suc- The aim of paper is to show the benefits of the educational data mining (EDM) techniques, in order to understand about of the factors which lead to technical student's success and failure, and predict their performance and determine the individual learning ability in engineering sciences. For these goals, we use the individual data of 311 student and their grades that were collected in Industrial Institute of Al-Diwaniyah city (Iraq) during 2015-2017 academic years, in order to predict the results of final theoretical exam in industrial drawing by applying EDM techniques, such as association rules mining, classification with decision tree algorithm learning, clustering with Apriori algorithm and anomaly detection implemented as the output model of the clustering. Using Microsoft SQL Server Business Intelligence Development Studio 2012 platform and based on Cross Industry Standard Process for Data Mining, we prepare of 13 nominal and numerical attributes for each student and consequently apply and finally evaluate all 4 EDM techniques. We conclude that: 1) association rules were revealed that the most important factor which contribute to the failure of the student is the "project" attribute; 2) decision tree classification permit to the teacher predict the future students and to correct the student's prediction path, but 3) clustering collects of the students into successful and failure groups and helps to the teacher to guide each group separately, and 4) to detect anomaly by аn extension DMX for SQL and correct the education process for students located on the border of the cluster.
Keywords: individual learning, data mining techniques, SQL server business intelligence development studio, clustering, classification, association rules, anomaly detection. cess or failure of students, therefore, the students should avoid these reasons to increase the likelihood of success.
2. Classification: to predict the success or failure of students in the theoretical exam at the end of the semester, to increase the opportunity to avoid failure.
3. Clustering: grouping similar students in terms of academic level within individual groups are successful and failed groups. The teacher can deal with each group separately, based on their academic achievement.
4. Anomaly detection: to identify rare elements, events or observations that give rise to doubts by diverge significantly from the majority of the data [6]. Is one of the most important goals for the teacher who often faces situations where the student is superior during the semester, but fails in the last exam (due: study pressure or psychological factors). On the other hand, the student who was lazy during the semester, but have success in the final exam. Question here "Is it because: hard study, ease of questions or cheating in the exam", highlighting on these cases are very important, it helps teachers and supervisors on educational process to understand and analyze the reasons.
This paper is based on Cross Industry Standard Process for Data Mining CRISP-DM [7]. CRISP-DM is the most commonly used methodology for developing data mining technique (Fig. 1), comes up to dissolve the dilemmas that existed in data mining project developments [8].

Data collection and preparation
Data preparation is important step and the most critical part of data mining process [9]. Dataset collected for the period 2015-2017 academic years, industrial institute of Al-Diwaniyah city in Iraq, this dataset consists of 301 instances with 13 attribute for two different data types (numerical, nominal), the final result "Pass" indicating to the possibility of student eligible for the final semester exam (theoretical exam). For preparing data, at first the numeric attributes converted to nominal to be compatible with the various algorithms in this paper, and also it will be easy to comprehend to reader. Table 1 shows the selected attributes of the mining process. To address the empty cells, we assigned the mean value for empty numeric cells, with regard to empty nominal cells, inserted the "missing" word. Thus, the process of data preparation and cleaning has been completed.

The application of data mining
Four techniques applied in this part on the dataset; association rules detection, clustering, classification, and anomaly detection. The dataset divided into two samples, the first sample of the dataset consist of 70% for 218 students to represent the modules for training algorithms, as for the second sample consist of 30% for 93 students to testing algorithms. Before applying these algorithms, it is important to identify the selected attributes and how to use them.
There are four types of attributes in CRISP-DM [10]: Key: indicates that the attribute is a key in the relational spreadsheet. Input: the attribute is used as input for the algorithm. Predict: indicates this attribute that required to expect it value, and can used as an input or output for the algorithm.
Predict-only: this attribute that required to expect it value, but can be used only as algorithm output.

Table 2 Attributes used in CRISP-DM
In Table 2 all the attributes are used only nominal values, as well as the three algorithms used these attributes, with regard to the anomaly detection; we applied a query on the clustering result, in section 3.4.

Association rules detection
The extraction of association rules relies on the Apriori algorithm, to find the most frequent elements, then generate rules as follows: A≥B (Support = 2%, Confidence = 70%). Identifying both parameters is very important, because it contributes to the exclusion of non-important rules. So, when Support = 2%, its means that A and B are exists together by 2% of the total number of records, and when the Confidence = 70%, it's means B exists by 70% of the records containing A.
The main purpose of applying the mining technique of extracting the association rules is to reveal the affect factors on the success or failure student in the practical exam (student will be eligible for the theoretical exam or not). Microsoft association rules used to achieve this purpose, by adjusting the most important variables of the algorithm "three variables" (MINIMUM_IMPORTANCE, MINIMUM_PROBABILITY, and MINIMUM_SUPPORT) while the others variables left to take a default values.
Determine the "minimum" of these variables, its means excluding all the rules that less than "minimum". Therefore, adjusted the value of the variable to an integer greater than "1", means determining the minimum of variable as (absolute value), but if specify a decimal between "0-1", here, determine the demand as a percentage.
MINIMUM_SUPPORT: is defined as Support ({A, B}) = Number of transactions (A, B), its represents the number of records, that containing both events A, B of the total number of records. Therefore, MINIMUM_SUPPORT means the minimum number of records which contain A and B together to create the rule, and thus excludes all rules that not identical this condition [11]. Attendance  Input  Input  Input  Extra_curricular  Input  Input  Input  Family_Income  Input  Input  Input  Homework  Input  Input  Input  Pass  PredicOnly  PredicOnly  PredicOnly  Practical_exam  Input  Input  Input  Project  Input  Input MINIMUM_PROBABILITY: is the one of characteristics of the association rules, defined as: Probability (B/A) = Support (A/B) / Support (A) .

Structure Classification Association Rules Clustering
(1) MINIMUM_IMPORTANCE: Is a measurement property of the base and elements group together, also called (interesting score or lift score), and allows the measurement of correlation A and B with each other defined as follows [11]: Importance (3) Where: If Importance =0 then A, B are independent. If Importance >0 then the probability of B increases when A is present or an integer. If Importance <0 then the probability of B decreases, if A is present. Determine values for the previous variables, based on the researcher's desire, so as to give the required results more accurately, when using very small values, many rules will be generated and many of which will be unimportant. In contrast, when large values using will generate very few rules and delete these rules may be useful for the researcher.

Classification
The Microsoft decision tree based on the ID3 algorithm, a decision tree is a tree structure flowchart, where each node takes one value or a range of values for one attribute. For that, each branch represents a result of the test; the tree leaves offer the distributions of classes [12]. The most current influential attribute is calculated by using the entropy criterion, where choose the attribute that gives less entropy.
MINIMUM_SUPPORT: the number of cases to be present in any node in the tree. Here, the variable value =7, because the database is relatively small. SCORE_METHOD: choose among three algorithms, to determine when a node in the decision tree separated into two or more nodes, "Entropy" is selected here, the possibility of all cases: "1" = Entropy value. "3" = Bayesian with K2 Prior value. "4" = Bayesian Dirichelt Equivalent with Uniform Prior value. SPLIT_METHOD: This variable determines how the node is divided into tree; we choose the "Complete" [11]. The possibility of all cases as follows: "1" =Value of "Binary": node is divided into two nodes exclusively, so that if our attribute (Practi-cal_Exam) has three values as good, average, poor, becomes (Practical_Exam= good, Practical_Exam= not good). "2" =Value of "Complete": The node is divided into all the possible values. So, the attribute which has two values, is divided into two branches, while it has three values, divided into three branches etc. "3" = Value of "Both": Apply two previous options together and the algorithm will select variable automatically.

Clustering
Each object is more similar to an object in the same cluster and minimal similar to objects in another clusters [13], so that the distance between the clusters points closer to each other, and away from the points of other clusters.
The Microsoft clustering algorithm in the default case is based on a Scalable Expectation Maximization, to implement the algorithm we need only one variable, (number of clusters). Here, we determined the value of the variable with only two clusters, where we collected the successful students in the practical exam in the cluster and the failure students in another cluster.

Anomaly detection
The aim of anomaly detection is the process of finding the patterns whose behavior is not normal in a dataset [14]. To find strange results in the clustering process, as the student status, who does not match his/her academic performance during the semester. Since, the SQL-SBIDS does not include a default option to implement this process; therefore, a Data Mining Extensions (DMX) for SQL has been implemented on the output model of the clustering process.

Fig. 2. DMX query for anomaly detection
DMX language developed by Microsoft in 1999 is designed to create an independent software interface of other companies, and depend on pre-defined concepts for database developers, to create and modify knowledge models resulting from data mining techniques [11]. In other words, DMX in the field of mining, as SQL in the field of databases. DMX query for anomaly detection among students demonstrated on the Fig. 2.

Results and Discussion
After the variables have been adjusted, the models are processed in SQL-SBIDS, and then we got three models, a model for each algorithm. In Table 3 shows the resulting of association rules, where observed there are four rules that descending order according to the "Importance" factor, these rules show the most relevant factors to the "Pass" class: Project, Homework, Family_Income and Extra_curricular. The second rule is more important than the first rule, because we can predict the "Pass" class by using only one attribute is "Project", while the first rule needs two attributes to achieve the same expectation with the same probability and importance. It is worth noting that the model did not produce any rules which determining when the value of "Pass" is equal to "Yes". All resultant rules predicts the associated factors with the students failure, so if we want to know the factors that cause success, all we have to do is reflect the results. For example, if the bad project grade leads to a student failure in a probability of 0,77 and 0,682 of confidence, the student should seek high marks in the "project" to raise the probability of success. From the previous rules we can observe that the Practical_Exam attribute is the most important attribute to predict "Pass" class, because it represents the root of the tree, where it gave less entropy, which is logical because the degree of practical examination is 15 degrees, which constitutes 50% of the grades of final practical exam 30 degrees. Knowing these rules is very useful for teachers and students, because they can pre-determined whether the student is qualified for the theoretical exam or not, therefore they can take appropriate decisions, must the student make a lot of effort on the practical exam or improve his activity and increase the attend of sessions. Fig. 4a and b shows the two clusters resulting of the applied of the Microsoft Clustering algorithm. Successful students that are belonged to the cluster 1 on Fig. 4a have different characteristics of their peers in the cluster 2 (failure). The length of the line for each attribute or property on Fig. 4b indicates how important it to its cluster, which were arranged according to their importance to the cluster.
The clustering process is a great interest in knowing the characteristics of each group, and thus can the teacher to deal with each group according to their academic level.   Fig. 2. The first record demonstrate that the student no. 94 qualified for the theoretical exam at the end of the semester, but he/she is a member of the second cluster (students group of expected to fail) with a probability of 0.07.

Fig. 5. Results of a DMX query apply of anomalies detection
In the second record, the student no.109 is not eligible for the exam, but it belongs to the cluster1 (students group of expected to success) with a probability of 0.15, while third record, we note that the student is eligible for the exam and belongs to the cluster of expectations of successful students with less score. So what is the problem? In this case gives us a good example to show that, although the student belongs to the right cluster, he/she does not necessarily have to be near the center of the cluster (as a Euclidean distance).
Therefore, the teacher can benefit from the results and look more deeply at the student level during the semester and analyze the situation of students to find the convenient solutions to ameliorate student performance.

Conclusion
This paper highlights the possibilities of applying data mining techniques in the academic field; SQL-SBIDS program was implemented to analyze student's data association rules, classification, clustering, and anomaly detection.
The application of the technique of the association rules was revealed the most factor that caused the failure of the student is the Project.
The application of classification by decision tree algorithm, an easy-to-understand tree was obtained, and the teacher able to predict the future results, through which he could take appropriate action to correct the student's prediction path.
The application of the clustering technique, the students collected into two groups (successful, failure), to understanding what distinguishes each group, which helps the teacher to lead and guide each group separately. An extension DMX for SQL has been implemented on the output model of the clustering process, to find anomaly detection, which is very important for the teachers to correct the path of the education process.
We hope that further research in the field of EDM will help us to resolve the principal problems of computer systems of individual instruction [15].