Salient Features Selection Techniques for Instruction Detection in Mobile Ad Hoc Networks

40-46


INTRODUCTION
Today, more people use mobile phones than traditional fixed phones. Nevertheless, we are experiencing a huge growth rates in mobile wireless communication. For many countries, mobile wireless communication is the only solution of communication in some location due to the lack of an appropriate fixed communication infrastructure [1]. While traditional communication paradigms deal with fixed networks in which security can be managed, wireless communication raises a new set of questions such as due to the openness of the network appropriate security mechanism are hard to achieve. Especially in the case of Mobile ad hoc networking [2], which is the subject of our research studies, security mechanism is hard to imagine due to some design implementation issues that defined the network such as: • Dynamic topology • Limited Bandwidth • Routing issues • Lack of central authority • Lack of association among nodes. MANETsare more exposed to malicious attacks due to the openness of the network and the autonomous aspect of the connecting nodes. Any node can be able to join and or leave the network at any time.
Attacks in MANETS can be classified as one of the two forms: Horizontal attacks or Vertical Attacks.
The Horizontal attacks are the existing attacks such as: Dos attacks, Blackhole attacks, Malicious attacks, etc, and can be term as going from 1 to n. Whereas, vertical attacks means be able to detect news attacks term as going from 0 to 1. Vertical attack is hard to imagine because it require detecting an attack nobody else has ever detect.
To cope with design issues related to MANETs which are the causes of the vulnerabilities of the networks, various IDS (Intrusion Detection Techniques) have been implemented by the researchers in MANET community to have a suitable resources sharing, and communication with less vulnerabilities to malicious attacks.
Intrusion detection is the technique that strives to detect an instructor that attempted into computer system then initiate responses to the intrusion [3].
Various Axes involved in the intrusion detection techniques such as the time at which the detection occurs, the types of inputs examined to detect instructive activities, and the range of responses capabilities as the simple form of alerting an administrator of the potential intrusion. These axes included in the design space for detecting intrusion in the computer systems have yields a wide range of solution known as Intrusion-Detection Systems (IDS). These intrusion detection systems techniques come in two forms: Signature-base detection and anomaly detection.
In signature-base detection technique, the system inputs or network traffic are scrutinized for specific behaviour patterns (or signatures) that are known to indicate attacks. In this approach, only known attacks are identified. This issue is well known to virus-detection software vendors, who must release new signatures on a regular basis as new viruses are generated and detected manually. In the case of anomaly detection, the attempt or detection is to characterise normal (or non-dangerous) behaviours and detect them when something other than these behaviours occur. Anomaly system activity does not always imply an incursion, but the presumption is that intrusions often induce anomalous behavior in a system. In particular, anomaly detection can detect previous unknown networks of intrusions.
The aim of this research studies are the detection of the salient feature selections techniques used by various researchers to have a best implementation of IDS techniques to remove the node having malicious intent on MANET that constitute security threats. In addition, our feature recommended feature selection approach mechanism is elaborated and the application is performed on our own data generated and the results are applied to existing Machine learning Classifiers algorithms to check the detection accuracy of the proposed feature selection algorithm.
This study is very crucial because feature selection is very important in order to detect malicious behavior during network monitoring to extract various features contributing to the data collected and the application of the learning algorithm to alert the system of the presence of malicious activities.
The paper's organizational structure is as follows: Section 2 is the description of the previous research work. The proposed method; simulation implementation and data collection is described in Section 3, illustration and results of proposed features extracted are given in Section 4, Experiments and results analysis of applied ML classifiers are given in section 5, and at last Section 6, is the Conclusions and future directions. Various features solution techniques have been proposed in the literature in order to have a better detection rate in intrusion detection approaches. In [4], for feature selection, a trial-and-error method of deleting one feature at a time is proposed. Neural Network and Support Vector Machines were applied on the selected features for the importance of ranking of the input features. The set of important features was determined to be the reduced feature set that produced the highest detection rate in the experiments. Paper, [5] used a Naive Bayesian classifier to detect network intrusions. They used KDD'99 data with all 31 attributes from the data set and reported overall error rates of 5.1 %. In paper [6,] a feature selection algorithm based on information gain and SVM is developed (Support Vector machine). Its basic principle is to group all data features based on information gain, and then use the SVM algorithm to select the best features subset. In the first stage, Sangkatsance, Watlanapongsakorn, and CharnsriPinyo suggested a realtime intrusion detection system (RT-IDS) and retrieved 12 critical elements from the network packet instructions header. In the second stage, to evaluate the importance of the feature, information gain was used in detecting different forms of attacks. By using (RT-IDS) for detection of different forms of attacks, the rate detection was 98% for probing and denial of service attacks classes. In [8], for feature reduction, the authors developed a clustering conjunct information hybrid technique. Features Clustering was done based on similarity in an unsupervised manner. To increase similarity with response features providing class labels, a supervised learning approach was employed to find important features. Kabiri [9] in his work, DDoS attacks were simulated, and a classifier based on Principle Component Analysis (PCA) was used to select useful attributes from a set of 16 attributes. The three most important attributes he found are routing reply, the number of received packets and total RREP. In paper [8], the authors have elaborated two steps process feature algorithm for intrusion detection system in which redundant features are reduced using mutual information approach with (KDD-99) data set for experimentation. Highest accuracy and in processing speed was achieved by the proposed method. Bayesian networks were use in [10] for data classification as well as to select features with the help of Markov Blanket method on the target variables. Support vector machine and neural network were suggested by the authors in [11] for the classification procedure. In all attack classes, the detection accuracy was outstanding. Barmejo, P. Ossa, L. Gamez, J. A. & Puerta, J. M. [12] proposes a mechanism for dealing with subset selection in datasets with a large number of attributes. The goal of their research was to produce excellent results with a small number of wrapping strategies. To achieve the best results, the suggested approach alternates between filter ranking and wrapper feature subset selection. Furthermore, the approach was tested on 11 high-dimensional data sets using several classifiers.

PROPOSED FEATURE SELECTION ALGORITHM
Data is one of the main components in instruction detection techniques analysis. However, large data can occupy more recourse and may result in inefficient of intrusion detection. As a result, data that does not contribute to detection must be removed before processing or using a learning algorithm for atypical attack detection. This necessitates the employment of an appropriate feature reduction technique that cannot only aid minimize training time, but also provide higher detection accuracy and detect anonymous attacks.
Our recommended feature extraction technique is described as two-steps process. In the first step, data pre-preprocessing is elaborated and two algorithms, such as information gain and correlation, perform the second step feature selection or ranking.

Data Pre-Processing
In reality, due to multiple sources of origins, data used for experimentation are highly unclean, and susceptible to noise [13]. As a result, low-quality data will yield low-quality detection results. Hence, before any feature technique can be implemented it is necessary to check if the data to be use is clean and accurate. Various techniques of data preprocessing have been proposed in the literatures. The inconsistencies in data and removal of noisy data can be achieved using data cleaning technique. To merge data from multiple sources into a coherent data or the storing of data in data wherehouse can be done with the method of data integration. The reduction of the size of data, or eliminating redundant features or clustering of data, can be achieved with the approach of data reduction. Finally, data transformation is applied to a data scaled within a smaller range related to 0.0 to 0.1. Therefore, the quality of data for experimentation has to satisfy the following requirements such as credibility, accuracy, interpretable, consistency and timeliness.

Feature Extraction
Data to be analyzedmay contain hundreds of features. Many of them may be unnecessary or redundant to the learning algorithm. Removing relevant features or keeping irrelevant features may be erroneous, and can lower the performance of the learning algorithm to be used. This can lead to the discovery of low-quality patterns. Furthermore, the addiction of an increasing volume of unnecessary or redundant features may gradually affect or slow down the learning process.

Information Gain
The information gain attribute selection technique is a research work approach done by Claude Shannon on information theory [13], by studying the value or "information content". The entropy of each feature is calculated using information gain. The higher the entropy, the more information it contains. The process of identification a given set of features vectors for which attributes is useful for learning process is done using information gain feature selection technique and the selected features will be used for classification in order to identify unknown instances and have a differentiation between types of attacks classes.
Let D be a set of training class-labeled tuples for the partition data. Let assume that the class label attribute has m different values that represent m different classes, for Ci for (i = 1, …, m). Let C i,D be theset of tuples of the class C i in D. Let |D| and |C i,D | denote the number of tuples in D and C i,D , respectively. The data required to classify a tuple in D is given by Where p i is the non-zero probability than an arbitrary tuple in D and part of class C i and being estimated by |C i,D | / |D|. A log function to the base 2 is used, because the information is encoded in bits. Therefore, the Info (D) is the average information needed to identify the class label of tuple D.
The term j D D acts as the weight of the j th partition. The Info(D) is the expected information needed to classify a tuple D based on the partition of A. Hence, the smaller the expected information required, the greatest the cleanness of the partitions. However, information gain is described as the difference between the initial information requirement and the new in requirement, obtain after partition of A. That is,

Correlation
To uncover features with greater utility values, we use a new mechanism that combines information gain with correlation-based features. Correlation is the second approach for ranking attributes. In a multiclass problem, the lower the correlation of a property in a feature vector, the more powerful it is to distinguish between distinct types of attacks. The pair-wise linear correlation coefficient between each pair of columns is returned by correlation as a matrix. Then, by taking the mean of each column, the correlation coefficient of each feature is computed.

SIMULATION IMPLEMENTATION AND DATA COLLECTION
The implementation of our MANET model is design with OPNET modeler 14.5 with AODV as routing protocol and 2 types of attack have been implemented such as selfish nodes attacks and Dos attacks.

Ad Hoc on Demand Distance Vector Protocol (AODV)
The mechanism to identify routes path if only if there are needed is the functionality of Ad hoc On Demand Distance Vector Protocol [14]. Therefore a route need to be established first, and once a route is being identified, the paths is preserved until there is no needed for it and or once the message desired to the destination is completed the route can be discarded.

Attacks Implementation 4.2.1 Selfish Node Attack
Selfish node attacks are nodes that are presents in the network and due to lack of energy or in order to preserve their energy consumption for future use do behave maliciously in the network. Selfish node behavior can be categorized as a node that does not perform the packet forwarding after receiving the packet intended to the requested node, or purposely disable its routing protocol to avoid packet forwarding and receiving to preserve it energy, or a node that has power failure or power off during the communication [21].
In the case for our studies, the selfish node implemented is the one that has its routing protocol disabled: Disabling of AODV routing protocol is the configuration of selfishness nodes attacks for our studies.

DOS Attacks
Denial of Service Attack (DOS) floods the network with unnecessary network traffic. The attack traffic consumes network resources, preventing legatine traffic from reaching the destination, wasting nodes energy.
Pulse Jammer attack is simulated in our case. Jammer attack [14][15][16][17][18][19] floods the network with high wireless radio frequency to disturb the communication in place. Jammer node is different in structure as compared to MANET node. With its radio transmitter, it frequently generates noisy frequencies on wireless channel. Jammer node generates highest bandwidth (in kHz) during the transmission. Jammer transmitter power indicates the transmission power (in Watts) allocated to packets transmitted through the channel. Lastly, the jammer node has a pulse width which point out the length of time (s) a pulse is transferred and a silence width specifies the interval in (s) between pulses [20].

Data Collection and Features Extraction
The recorded data set collected after the simulation was performed contains 15 features plus assigned classes label classifying each record as normal node, Selfish node attacks and dos attacks. The total number of instances that characterize each attack class is distributed in Tab. 3.    With the help of WEKA which is one of the powerful data analysis machine learning software, developed at the University of Waikato, New Zealand [15]. The information gain and correlation ratio is calculated with all the 15 features and the results are listed in the Tabs. 4 and 5. Base of the ranking of the information gain and correlation of collected features for our studies, the first stage of our recommended feature selection method results is summarized in Tab. 6.   The second stage of our recommended feature selection method proposed is the computation of the Union operation of IGFS 1 and IGCR 1 and the Interception operation of IGFS 2 and IGCR 2 , which was store as NMRFS and NRFS respectively, and the best feature selection is the results of the Union operation of NMRFS and NRFS.
The importance of having a feature selection before any IDS methods can be implemented is that there are some features in the data set which can lead to the deterioration of the performance of the classifier learning method considered for anomaly detection. Therefore, any feature F is important if by removing it from the set of features affect the classifier performance. Having feature selection mechanisms in place will contribute to the predictive classifier model to be considered and helping choosing important features that will generate best accuracy and less complexity time when we acquired new data.

EXPERIMENTS AND RESULTS ANALYSIS
Experiment in this section used three most existing Machine Learning Classifiers such as NaiveBayes, RandomForest Decision tree and J48. The experiment has two phases: the first phase is the results of the performance using all 15 features applied to the Three ML classifiers. In the second phase, the evaluation is done with the 9 extracted features with all the three ML Classifiers. To measure the performance of the three ML classifiers, precision, recall, and F1 score evaluation measures were used, because they are the most use measurement for performance evaluation in anomaly detection detection techniques. Precision is the percentage of relevant instances found among the retrieved instances. The proportion of important retrieved instances in the total number of important instances is referred to as recall. The harmonic mean of precision and recall is used to calculate the F1 score. These three performance evaluation metrics depend entirely on the confusion matrix in which four possible situations can be defined, as shown in Tab. 9.
The experiment shows that 100% accuracy results is achieved with only all the 9 features extracted with our own algorithm with the data set created by using RandomForest classifier. However, we also have a higher accuracy for normal node and as well as DOS attack in case of NaiveBayes and higher accuracy for DoS Attack for J48 classifier.    To sum up our feature selection technique performed well with RandomForest classifier.

CONCLUSION AND FUTURE WORK
In this paper, we have presented a novel feature extraction technique to overcome the difficulties encounter during features selection techniques for networks intrusion detection techniques. The dataset generated for our study is completely labeled and 15 network traffic features have been extracted for intrusive flows detection. The feature selection technique proposed has extracted 9 important features. The selected features extracted has been use to compared the performance of threemost well-known ML classifiers and the experiment shows that the higher accuracy results could be achieved by using RandomForest classifier. Our future work will be the use of Ensemble method for classification to improve the accuracy of detection rate. An Ensemble combines various ML classifiers (Base classifiers) for learning purpose and each base classifier is assigned a unique vote. Based on the votes of the base classifiers, the ensemble returns the prediction class for learning purposes.