An Intrusion Detection System Using Singular Average Dependency Estimator in Data Mining

Intrusion Detection System (IDS) is a vital component of any network in today’s world of Internet. IDS are an effective way to detect different kinds of attacks in interconnected network. An effective Intrusion Detection System requires high accuracy and detection rate as well as low false alarm rate. To tackle this growing trend in computer attacks and respond threats, industry professionals and academics are joining forces in order to build Intrusion Detection Systems (IDS) that combine high accuracy with complexity and time efficiency. With the tremendous growth of usage of internet and development in web applications running on various platforms are becoming the major targets of attack. Security and privacy of a system is compromised, when an intrusion happens. Intrusion Detection System (IDS) plays vital role in network security as it detects various types of attacks in network. Implementation of an IDS is distinguishes between the traffic coming from clients and the traffic originated from the attackers or intruders, in an attempt to simultaneously mitigate the problems of throughput, latency and security of the network. Data mining based IDS can effectively identify intrusions. The proposed scheme is one of the recent enhancements of naive bayes algorithm. It solves the problem of independence by averaging all models generated by traditional one dependence estimator and is well suited for incremental learning. Empirical results show that proposed model based on SADE is efficient with low FAR and high DR.

Intrusion Detection System (IDS) is a vital component of any network in today's world of Internet. IDS are an effective way to detect different kinds of attacks in interconnected network. An effective Intrusion Detection System requires high accuracy and etection rate as well as low false alarm rate. To tackle this growing trend in computer attacks and respond threats, industry professionals and academics are joining forces in order to build Intrusion Detection Systems (IDS) that combine high accuracy with low complexity and time efficiency. With the tremendous growth of usage of internet and development in web applications running on various platforms are becoming the major targets of attack. Security and privacy of a system is compromised, when an on happens. Intrusion Detection System (IDS) plays vital role in network security as it detects various types of attacks in network. Implementation of an IDS is distinguishes between the traffic coming from clients and the traffic originated from the kers or intruders, in an attempt to simultaneously mitigate the problems of throughput, latency and Data mining based IDS can effectively identify intrusions. The proposed scheme is one of the recent enhancements of naive bayes algorithm. It solves the problem of independence by averaging all models generated by traditional one dependence estimator and s well suited for incremental learning. Empirical results show that proposed model based on SADE is ntrusion detection, Data Mining, KDD data set, False Alarm Ratio,

INTRODUCTION
As the years have passed by computer attacks have become less glamorous. Just having a computer or local network connected to the internet, heightens the risk of having perpetrators try to break in, installation of malicious tools and programs, and possibl that target machines on the internet in an attempt to remotely control them. The (GOA) team the attacks encountered in 2014 discovering that 25% of the attacks where non-cyber threats followed by scan/probes/attempted access 19% and p violation 17% [1]. This data is further acknowledged by the annual FBI/CSI survey which discovered that though virus based attacks occurred more frequently, attacks based on un-authorized service attacks both internally as well a increased drastically.
Recent exploits also suggest that the more sensitive the information that is held is, the higher the probability of being a target. Several Retailers, banks, public utilities and organizations have lost millions of customer data to attackers, losing money and damaging their brand image [2]. In some cases attackers steal sensitive information and attempt to blackmail companies by threatening to sell it to third parties [5]. In the second quarter of 2014, Code Spaces was forced out of business after attackers deleted its client databases and backups. JP Morgan, Americas" largest bank, suffered a cyber 2014 that impacted 76 million members [3]. In 2014, Benesse, A Japanese Education Company for children suffered a major breach whereby a disgruntled former employee of a third-party partner disclosed up to 28 million customer accounts to advertisers [4]

Singular Average Dependency Estimator in Data Mining
As the years have passed by computer attacks have become less glamorous. Just having a computer or local network connected to the internet, heightens the risk of having perpetrators try to break in, installation of malicious tools and programs, and possibly systems that target machines on the internet in an attempt to remotely control them. The (GOA) team categorized the attacks encountered in 2014 discovering that 25% cyber threats followed by scan/probes/attempted access 19% and policy violation 17% [1]. This data is further acknowledged by the annual FBI/CSI survey which discovered that though virus based attacks occurred more frequently, authorized access and denial of service attacks both internally as well as externally, Recent exploits also suggest that the more sensitive the information that is held is, the higher the probability of being a target. Several Retailers, banks, public utilities and organizations have lost millions of tomer data to attackers, losing money and damaging their brand image [2]. In some cases attackers steal sensitive information and attempt to blackmail companies by threatening to sell it to third parties [5]. In the second quarter of 2014, Code forced out of business after attackers deleted its client databases and backups. JP Morgan, " largest bank, suffered a cyber-attack in 2014 that impacted 76 million members [3]. In 2014, Benesse, A Japanese Education Company for children major breach whereby a disgruntled former party partner disclosed up to 28 million customer accounts to advertisers [4]. Most notably the "Sony Pictures hack" best displayed how International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470 @ IJTSRD | Available Online @ www.ijtsrd.com | Volume -2 | Issue -5 | Jul-Aug 2018 Page: 1714 significant a company's losses are in the aftermath of a security breach. The network servers were temporarily shut down due to the hack [4]. Cyber Security experts estimate that Sony lost up to $100 million [5] [6]. Other companies under the Sony blanket fell victim to attacks [7]. To tackle this growing trend in computer attacks and respond threat, industry professionals and academics are joining forces in a bid to develop systems that monitor network traffic activity raising alerts for unpermitted activities. These systems are best described as Intrusion Detection Systems.

DATA MINING BASED IDS
Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information -information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. Network traffic is huge and information comes from different sources, so the dataset for IDS becomes large. Hence the analysis of data is very hard in case of large dataset. Data mining techniques are applied on IDS because it can extract the hidden in formation and deals with large dataset. Presently Data mining techniques plays a vital role in IDS. By using Data mining techniques, IDS helps to detect abnormal and normal patterns. The various data mining techniques that are used in the context of intrusion detection.

Correlation Analysis:
Correlation is often used as a preliminary technique to discover relationships between variables. More precisely, the correlation is a measure of the linear relationship between two variables.

Feature Selection:
A subset of features available from the data is selected for the application of a learning algorithm.
3. Machine Learning: Machine learning explores the study and construction of algorithms that can learn from and make predictions on data 4. Sequential Patterns: It is used to excavate connection between data, time series analysis gains more focus on the relationship of data in times.

Classification:
It is a technique of taking each instance of a dataset and assigning it to a particular class. Typical classification techniques are: inductive rule generation, genetic algorithms, fuzzy logic, neural networks and immunological based techniques.
6. Clustering: It is a technique for statistical data analysis. It is the classification of similar objects into a series of meaningful subset according to certain rules, so that the data in each subset share some common trait. Data mining tools sweep through databases and identify previously hidden patterns in one step. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions. Our proposed approach is shown in figure 2. Our network intrusion detection model applies on the Optimize SADE classifier.

Figure 1: Optimize SADE approach
Our proposed algorithm is described below Algorithm: Intrusion Detection System using Optimize SADE techniques. Input: NSL-KDD Data set Output: Classification of different types of attacks.
Step I: Load NSL KDD data set.
Step 3: Clustered the datasets into four types.
Step 4: Partition each cluster into training and test sets.
Step 5: Data set is given to proposed algorithm for training.
Step 6: Test dataset is then fed to propose for classification of attacks.
Step 7: Extract the features value of test dataset.
Step 8: Optimize the features value in continuous orthogonal way.
Step 9: Now classify features value as per number of attacks.
Step 10: Determine average number of attacks in respective attack class, namely and DoS, Probe, U2R, R2L.

RESULT ANALYSIS
Many standard data mining process such as data cleaning and pre-processing, clustering, classification, regression, visualization and feature selection are already implemented in MATLAB. The automated data mining tool MATLAB is used to perform the classification experiments on the 20% NSL-KDD dataset. The data set consists of various classes of attacks namely DoS, R2L, U2R and Probe. The data set to be classified is initially pre-processed and normalized to a range 0 -1. This is done as a requirement because certain classifiers produce a better accuracy rate on normalized data set. Correlation based Feature Selection method is used in this work to reduce the dimensionality of the features available in the data set from 41 to 6. Classification is done in this work by using SADE algorithms.
The specific types of attacks are classified into four major categories. The table 1 shows this detail.
International Journal of Trend in Scientific Research and @ IJTSRD | Available Online @ www.ijtsrd.com  (7) The Table 2 shows the distribution of the normal and attack records available in the various NSL datasets.      We used accuracy, detection rate DR), false alarm rate (FAR) and Matthews correlation coefficient (MCC) which are derived using confusion matrix.  We conducted all our experiments using WEKA tool [14]. The performance of our proposed model is shown in table 5 and for IIDPS shown in table 6.

CONCLUSIONS
In this paper an ANN based Intrusion Detection System was implemented on NSL-KDD dataset. Dataset was trained and tested for binary category (normal or attack) as well as for five class attack categories. Training set having less number of patterns for R2L and U2R categories so some patterns were selected randomly from other three classes in training set. The proposed IDS system uses Levenberg-Marquardt (LM) and BFGS quasi-Newton Backpropagation algorithm for learning. Training and testing applied on dataset with full features (i.e. 41) and with reduced feature (i.e. 29). The result was evaluated based on standard parameter such as accuracy, detection rate and false positive rate and the result was compared with other reported papers. It was found that proposed technique for binary class classification gives higher accuracy of attack detection than that of other reported technique. For five class classification it was found that the system has good capability to find the attack for particular class in NSL-KDD dataset.
In this paper, we applied the SADE algorithm to detect four types of attack like DOS, probe, U2R and R2L. 10 cross validation is applied for classification. The proposed approach is compared and evaluated using NSL KDD data set. Experimental result prove that accuracy, DR and MCC for four types of attacks are increased by our proposed method. Empirical results show that proposed model compared with IIDPS generates low false alarm rate and high detection rate. For future work, we will apply feature selection measure to further improve accuracy of the classifier.