Improving the Performance of Distance Relay Using Wavelet Transform

: With proliferation of power grids, different types of faults are more likely to occur. The purpose of system protection relays is to detect abnormal signals that indicate faults in the transmission system and to separate the fault section from the rest of the system to prevent the fault from propagating to other parts of the system. Proliferation of electronic devices led to creation of digital relays made of microprocessors. Hence, analog measurements are converted into digital signals for processing by microprocessors. Air grids are more likely to cause faults than other components of the power system; thus, disturbances affecting the system must be detected quickly and accurately. Therefore, the problem of fault detection and classification is an importan t factor for economic exploitation of the power grid. Accurate fault handling results in faster repair operations, better system availability, lower operating costs and timesavings. The proposed design in this study was based on detecting the type of fault caused in transmission lines. In order to improve the condition of the relays in the transmission grid, analysis of the signals reached to the relay on both sides of the line was used to detect the fault and its type. The main purpose was to quickly detect the type of fault using wavelet transform. For this purpose, the signal was sampled after the fault occurs and the feature signal was extracted after analysis by wavelet transform. These features were included in the decision tree classifier and the type of the fault was decided.


INTRODUCTION
Transmission lines are one of the main elements of the power system that connects production to consumption. Lines are spread over a large area and have a variety of faults. Current power systems, including factors that interact with each other in complex ways, may cause system faults [1]. Line faults may be caused by a variety of reasons such as lightning, sparks, birds, storms, snow and ice. Deformation of insulating materials can also cause short circuit faults. Line protection against faults is an important principle in system protection. Because lines are important components of a power system, its protection is essential to ensure system stability and reduce equipment damage due to short circuits that may occur in transmission lines. Continuity of this system is very important to achieve reliability of the power system. Air grids are more likely to fail than other components of the power system; faults in high voltage transmission lines also produce high frequency transients. These fault-induced transients have frequency, size, and damping rate that depend on many factors such as fault location, fault type, and system parameters. Fault occurs when two or more conductors come into contact with each other or with the ground in a three-phase system. These faults have devastating effects on power system equipment. The fault that occurs on the transmission line not only affects the equipment but also power quality. Therefore, it is necessary to determine the type of fault, locate it on the line, and eliminate it in the shortest possible time in order not to cause damage. As a result, disturbances affecting the system must be detected quickly and accurately. Rapid error detection, the ability to quickly separate the faulty line and protect it from harmful effects of the fault. To economically operate and sustain the power grid, it is important to detect, classify, and clear transmission line faults as soon as they occur. The accuracy of fault detection methods depends on measurement of voltage and current or its estimation method. Complexity of this is further intensified despite random changes in parameters such as fault type, fault location, fault angle, fault impedance, and so on.
With proliferation of power grids, different types of faults are more likely to occur. A typical fault elimination system includes a circuit breaker and a protection relay. The purpose of system protection relays is to detect abnormal signals that indicate faults in the transmission system and to separate the fault section from the rest of the system to prevent the fault from propagating to other parts of the system. In later years, faults were detected by electromechanical relays. The measured values, such as voltage and current, became mechanical force when they exceeded the predetermined threshold value, causing the relay to operate. Expansion of electronic devices led to creation of digital relays made of microprocessors. Hence, analog measurements are converted into digital signals for processing by microprocessors. Transmission line relays have three main functions: fault detection, fault classification, and fault location. Fault detection means detection of different modes of fault generation in transmission lines during which system current and voltage exceed the normal range; if the fault is not eliminated, it may result in equipment damage or even instability in the power grid and system collapse. Failure classification is detection of fault type. In fact, at this stage, detection of faulty phase or phases in order to correctly cut the faulty lines is done by considering the reduction of power outages frequently. After detecting the type of fault, fault location is determined in order to determine the exact location of the fault in order to send personnel and repair the faulty lines.

WAVELET TRANSFORM FUNCTION
Wavelets are mathematical transformations that divide information into different frequency components. The study is then performed on each component with a resolution that is related to waveform scale. It has the advantage over classical Fourier methods in analysis of physical situations where the signal contains discontinuities or sharp spikes. Wavelets have been developed independently in the fields of mathematics, quantum physics, electrical engineering, and seismology. Exchange between these areas has led to newer applications of wavelets in image processing, perturbations, radar, and earthquake prediction in recent decades. The basic idea of wavelets is to analyze them on a scale basis. This idea does not seem very new. Because it was discovered in early 1800 that Fourier could create other functions by substituting sine and cosine functions. Wavelet algorithms process information at different scales and resolutions. If a signal is viewed through a large window, its basic features can be accessed. Similarly, if a signal is viewed through a smaller window, more accurate and detailed features can be obtained. For decades, scientists have sought to achieve more functions like sines and cosines that form the basis of Fourier analysis for variable signals. Limited studies have been published on approximation of sharp spikes, but with wavelet analysis, it is possible to use approximate functions located in finite regions. The wavelet analysis process is actually the matching of a sample function called parent wavelet with original signal function. Transient-state analysis is performed with a high-frequency and contracted parent wavelet sample, while permanent-state frequency analysis is performed with an open and low-frequency sample of a parent wavelet. Thus, the main signal or function can be displayed in terms of wavelet expansion.
As noted, wavelet transform has different base functions from the Fourier transform, which has a limited energy, leading to localization of these functions in the analyzed range. Each parent wavelet also has a specific and limited range of frequencies, which results in extraction of these limited frequencies in each analysis range. These features make the wavelet transform very efficient for extracting nonstatic features. Fast Fourier transform and discrete wavelet transform are both linear operators that produce a data structure. Both transformations can work to convert a function from one space to another. For fast Fourier transform, this new domain includes base functions including sine and cosine functions; for wavelet transform, this new domain involves much more complex functions called wavelet or parent wavelet. The most important difference between the two types of transformation is that wavelet functions are centralized, whereas cosine and sine functions of the Fourier transform are not. One advantage of wavelet transformations is that the windows are variable.
In order to separate the signal discontinuities, very short base functions are required, and at the same time, very long base functions will be responsible for analyzing the more detailed frequencies. One-way to access, this information is to have short high frequency base functions and long low frequency base functions. This is exactly what a wavelet transform creates. Another point to keep in mind is that wavelet transforms, as Fourier transforms, do not have a single set of base functions that use only finite sine and cosine functions. Wavelet functions have an infinite series of base functions. Wavelet analysis therefore provides access to information that is obscured by other time-frequency methods such as Fourier analysis. If the original signal is well known, then the wavelet type is easily chosen, but generally the signals are not well known in advance. Clearly, if the type of parent wavelet is chosen correctly, predictive efficiency of the model will increase. In this regard, there are various wavelet generation algorithms that can match the wavelets with the criteria defined by the user.
Continuous wavelet transform of the signal f(t) is: The function Ψ(t) is the parent function or wavelet, and a and b are dilation and translation parameters, respectively. Wavelet transform is mainly used at k level because it can be effectively performed using two filters (one is high-pass and the other is low-pass).
The number of decomposition levels depends on the sampling frequency and the frequency of which the information is important. The results are sampled with a coefficient of two reductions to reduce the computational load. Two identical filters are used to output the low-pass filter from the previous step. The high-pass filter is taken from the wavelet function (parent wavelet) and details are measured at a specific input. At the other side, the low-pass filter receives the input signal. This idea is shown in Fig. 1.

C4.5 ALGORITHM
ID3 algorithm is an algorithm for building a decision tree. In this algorithm, the concept of irregularity is used to classify the data, and the algorithm tends to minimize the amount of irregularity in the upper nodes of the tree so that a tree with a minimum height can be obtained. Thus, irregularity is first calculated for all features of the raw data and then the feature with the highest utility is chosen as the root. Utility of each feature is then calculated, in which the amount of irregularity remaining in the classes is due to the use of a feature, which can be obtained by summing the probability of occurrence of each division. This algorithm is only able to classify data with a range of discrete and limited features and is not efficient for noisy and distorted data. Completed C4.5 algorithm is ID3 algorithm [2][3][4]. This algorithm is also able to classify continuous and noisy data. For this purpose, the data is first sorted; then utility values are obtained for all cases in which it is possible to separate the sorted data, and choose the separator corresponding to the largest value of utility as a separator [5][6][7].

RANDOM FOREST (RF) ALGORITHM
When forming a decision tree, a small change in learning patterns can cause fundamental changes in the structure of that tree. To overcome these problems, a random forest algorithm is proposed, which is a learning method based on a bunch of decision trees. In this method, classifier outputs become much more powerful than noisy data. The random forest prediction model is based on averaging the results of all relevant decision trees. By using this method, useful information about importance of each variable will be obtained and thus the variables with the greatest impact on the dependent variable are determined. In this method, the number of m predictors are randomly chosen for bootstrap sampling from the training and tree production set. After producing a large number of trees, each tree votes for the most popular class. By merging the votes of different trees, a class is predicted for each sample. The very high accuracy of this method is one of its advantages, while it can work well with a large number of inputs [8,9].
Jamali et al. [10] used several classifier algorithms to classify power quality disturbances and finally concluded that DT algorithms have higher accuracy and lower computational load compared to others.

THE PROPOSED METHOD 5.1 Validating the Proposed Method
Conventional validation methods can be divided into three categories. One method uses all the data both to train the classifier algorithm and to test it, which, in addition to possibility of over-fitting in the classifier, results in an optimistic algorithm error rate. The other method randomly breaks the data into two sets of training and testing, trains the algorithm, and then classifies it. This method may also not train the algorithm well and the exact error rate may not be obtained. Another method is to break the data into several classes. This type of validation is divided into three categories: 1) Random multiple sampling method in which the data set is divided into two subsets of training and testing. The model is then trained using training data and the result is validated using test data. This procedure is repeated several times and average of the results is considered as the final estimate. The advantage of this method is that the ratio of training and test data in each run does not depend on the number of subsets. The disadvantage of this method is that some data may never be used for validation and others may be used multiple times. In other words, subsets can overlap. 2) Another method of breaking is called k-fold, in which the data is randomly split into k subsets. Of these k subsets, one is used at a time for validation and the other k-1 for training. This procedure is repeated k times and all data is used exactly once for training and once for validation.
Finally, the average result of these k validations is chosen as a final estimate. Ten folds are typically used. 3) Another method is called leave-one-out. As the name implies, at each stage one of the data is left out for validation and the rest of the data is used for training. This method is actually the k-fold method in which k is considered equal to the number of data. This method is computationally expensive because the training and validation process is repeated over and over again [11,12]. In this study, validation was performed using the kfold method with a value of k equal to 10.

Implementing the Proposed Method
Transient states in power systems generally have nonperiodic, short-lived, and non-stationary waveforms. Wavelet transform can decompose signals into different frequency ranges. Discrete wavelet transform of voltage and current signals is used to extract the feature vector required for classification. The purpose of feature extraction is to determine the unique characteristics of a voltage or current waveform that can be used to detect the type of fault.
In this study, different fault scenarios will be simulated first using MATLAB software; in order to create a relay with maximum speed and accuracy, some parent wavelets from the Daubechies family that are widely used in the field of power system disturbances detection will be studied and examined. db4 from the Daubechies family will be considered as parent wavelet. The fault signals obtained from the grid simulation will be decomposed separately using each parent wavelet to the required number of levels. The coefficients obtained from the voltage or current signals will be decided by discrete wavelet transform, as a feature vector to the tree classifier and the decision tree will be trained using WEKA software. WEKA software is a reliable software in machine learning and data mining systems [13].

The Studied Grid
In order to implement the fault type detection project in the transmission grid, a two-circuit line grid of the transmission grid will be considered with the same start and end. The relay is installed at the beginning, end of this line according to Fig. 1, and will adapt the required signals from the expected transmission line at any time [14]. In the above grid, 20 types of faults (10 types of faults per line) including three single phase-to-ground faults, three double phase-to-ground faults, three double-phase faults and three-phase faults will be simulated in different scenarios. These scenarios will vary in terms of fault time (fault angle), fault location and fault resistance. Simulated current or voltage signals will be stored in the Simulink section of MATLAB software and uploaded in its coding section for analysis using wavelet transform. After wavelet analysis and feature extraction and unification, the available data will be inserted in Weka software in order to classify and detect the type of fault. In this software, the best features are chosen by different methods and the data is classified using the two methods expressed by decision tree algorithm, and the fault classes are separated from each other and the type of fault is determined. Naturally, the type of feature and choosing the best features will affect the final accuracy obtained. In fact, Weka output will determine how accurately the proposed algorithm is able to detect the type of fault. The parameters for the simulated lines are as shown in Tab. 1.
As shown in Tab. 1, the first part of the line is 225 km double-circuit line and the next part is 100 km single-circuit line that connects the two parts of the system. The voltage at both ends of the lines is 400 kV and the short circuit level for both sides is 200 MV. A load angle of 30 degrees is also considered so that nominal current of 730 amps can pass through each circuit of the double-circuit line.  The simulated circuit of the system in the Simulink section of Matlab software is as shown in Fig. 3.
As shown in Fig. 3, the double-circuit line is simulated separately at intervals of 22.5, 90, 90 and 22.5 km. In fact, these distances provide 10, 50 and 90% of the line to simulate the fault in different locations.

SIMULATION OF FAULT TYPES
Since there can be twenty faults in two-circuit lines (10 faults per circuit), the relay of each circuit must be able to correctly detect faults of the second circuit, in addition to correctly detecting its own faults and commanding a faulty circuit to disconnect, and do not command disconnection. In fact, since the ends of this double-circuit line are connected as shown in Fig. 3, the fault in one circuit will affect the other line as well and increase its current. Therefore, it is obvious that the desired protection is achieved when the disconnection is commanded only in one circuit (circuit in which the fault occurred) and the second circuit continues to operate. Because the fault may occur anywhere on the line, at any angle of the line voltage, and with any different fault resistance, comprehensive information must be provided for various faults. For this purpose, three-phase current signals will be simulated at points 10, 50 and 90% of the line. Also at each point, the fault will be simulated separately at 0 and 90 voltage angles. Each fault will be considered with resistances of 0.001, 5, 10, 15 and 20 ohms.
As can be seen from the above, 300 signals for different types of faults will only be simulated at 10% of the first circuit line. In fact, these 300 signals will provide three-phase currents for 100 different fault scenarios at 10% of the line. The same will be simulated for 50 and 90% of the line, which will result in 300 fault scenarios from all simulated faults. In all scenarios, the simulated fault of the three-phase current signals of the first circuit and the second circuit will be removed, which will provide a total of 1800 signals (900 signals from the first circuit and 900 signals from the second circuit) to analyze and extract the features. In fact, the first circuit relay must detect 300 fault scenarios in the first circuit and command disconnection after detecting the type of fault; the second circuit relay, although it will observe fault current signals, must detect the fault in the first circuit and not command disconnection.
As noted earlier, single-phase faults including AG, BG, CG, double phase-to-ground faults such as AGB, ACG, BCG, double-phase faults such as AB, AC, BC and threephase ABC fault will be simulated in two circuits. In order to train the decision tree algorithm, for each of the above disturbances, 30 different simulation scenarios occur in the proposed grid and waveform of three-phase currents is stored. These simulations differ from points of view such as fault time and fault resistance and fault location, and no two scenarios will have the same waveform.

Figure 4 The simulated system by MATLAB
The simulated forms of these disturbances are generated and stored by the Simulink and then analyzed using wavelet transform and the feature is extracted from them. In the following, the three-phase current waveforms for various types of faults in 10% of the first circuit are presented from the viewpoint of the relay embedded in the first circuit. Fig. 4 shows faults of AG, BG, AC, ABG, ACG, BCG, AB, AC, BC and ABC in location of 10% of the first circuit and with fault resistance of 0.001 ohm seen from the relay at the beginning of the first circuit line. Note that the sampling frequency of the relay is 1.6 kHz. As a result, there are 1600 samples for 1 second. The simulations are done for half a second (800 samples) and the shapes are drawn for better clarity between the ranges of 200 to 600 samples.

Classification despite Noisy Data
Despite the noise in the data, accuracy of classifiers will decrease. Because the presence of signal distortion causes the acquired features to lose their efficiency in separating the classes. Output accuracy will vary according to different values of signal to noise ratio (SNR) [14,15]. In this study, SNR values of 35 and 45 dB will be investigated [16]. The classification results with different features and different classification methods and 45 dB noise are presented in Tab. 2. It is worth noting that the results of this section will be presented with 24 features (optimal value of features in classification of phenomena obtained in the previous section). The dispersion matrix is displayed only for the decision tree algorithm with 10 trees, which provided the best performance in terms of accuracy and speed in the previous section. AG fault in 10% of the first circuit seen from the first circuit relay BG fault in 10% of the first circuit seen from the first circuit relay CG fault in 10% of the first circuit seen from the first circuit relay ABG fault in 10% of the first circuit seen from the first circuit relay BCG fault in 10% of the first circuit seen from the first circuit relay ACG fault in 10% of the first circuit seen from the first circuit relay AB fault in 10% of the first circuit seen from the first circuit relay AC fault in 10% of the first circuit seen from the first circuit relay BC fault in 10% of the first circuit seen from the first circuit relay ABC fault in 10% of the first circuit seen from the first circuit relay Figure 6 Results of fault simulation (continued from previous page) It can be seen that despite noise in the data, accuracy of the classifiers decreases, but despite the noise of 45 dB, the decision tree with 10 trees still performs well in classification of faults. The dispersion matrix in this case, as shown in  In order to further investigate, the amount of noise is increased and SNR is reduced to 35 dB. With this amount of noise in the current signals, the accuracies obtained in separation of fault scenarios will be according to the table below.
It is observed that output accuracies will decrease with increasing noise in the data. It can be seen from the table above that accuracy of the algorithm will not decrease with 100 trees in the decision tree algorithm. As a result, in case of high noise in the data, the number of trees can be increased in order not to reduce the classifier accuracy, in addition to noise reduction methods. The dispersion matrix for the decision tree with 10 trees is also shown in Fig. 7.

CONCLUSION
Analysis and fault detection in power systems raises issues related to power systems, including: significant economic impacts for operators, maintenance agents, and power industry. For this reason, the search and development of new algorithms and methods to solve this problem has been considered. Parallel transmission lines are widely used in power grids to transfer high power and increase system reliability. However, in terms of protection, parallel transmission lines require special considerations to singlecircuit transmission lines. When considered as an independent circuit from a conventional distance relay to protect parallel lines, mutual coupling between two circuits Phase C affects the impedance measured by the relay. This will reduce the range or increase the relay range depending on grid features. This study suggests a method to improve the protective performance of distance relays in transmission grid based on wavelet transform and decision tree. Due to shortcomings of Fourier transform in the field of fault detection, the use of wavelet transform in the field of fault detection or changes in signals of the power system has attracted much attention in recent years. In fact, this method is a mathematical conversion based on conversion of a signal into various types of scaling by another function called parent wavelet.
The most important reason for using wavelet transform is high resolution and time and frequency precision of this transformation. Wavelet transform is able to show some signal properties that other transformations are not able to show and destroy these properties during transformation. These properties include high slopes in the function, breakpoints in the function, discontinuities of higher order derivatives of the function. Accordingly, it can be concluded that wavelet transform provides a partial and regional view of the function. This property increases the accuracy of the work while a transformation like Fourier provides an overview of the signal period. Each function used as parent wavelet has a mean of zero and a unit energy, and as proposed, this transformation provides a frequency time form of the signal. This study used discrete wavelet transform as a package. Since the 1.6 kHz signal sampling frequency is chosen, this transform is able to extract signal information up to 800 Hz (half the sampling rate) according to the Nyquist theorem. By breaking the wavelet transform up to 4 levels, we will achieve 50 Hz nodes. Because the generated fault signals, including single-phase fault, double phase-toground, double-phase, and three-phase faults, do not have high frequencies in independent circuits, 0 to 50 Hz node in the fourth level of wavelet transform containing information about the main component of the signal is chosen to extract features.
After extracting various statistical features and unifying them, classification was done using two methods of decision tree algorithm. These two methods differ in terms of tree production and pruning, the ability to classify with high accuracy despite the noise, and how to use the features in tree production. Random forest uses a set of trees produced to determine the final class, which makes this method more resistant to noise in the data. After simulating 8 classes of disturbances and generating 1800 different signals and 600 fault scenarios for classes and extracting and unifying features, a 600×27 matrix was formed and given to WEKA software. Then, the algorithm was tested by k-fold, k = 10. The results were investigated under different conditions despite the noise in the data and with different number of features. Considering the presented results, it can be argued that if the features extracted from wavelet coefficients of 50 Hz node are used and the random forest method is used, it is possible to classify the faults with maximum accuracy and correctly detect the fault type. As only half, a cycle of postdisturbance data is used and the presented features are used and computational load of the random forest method is low, it can be acknowledged that the proposed method, in addition to low computational load, offers high speed and accuracy. As wavelet transform coefficients are sensitive to noise in the data, the accuracy obtained despite 35 and 45 dB noises were also studied. Despite the noise in the data, signal denoising methods should be used before extracting the feature in order not to reduce the final accuracy, in which case the processing and calculation operations of the algorithm will be increased. A higher number of features or noise-free classification methods can be used. By these results, it can be observed that the algorithm will only have 2% fault and acceptable accuracy despite 35 dB noise in data with 24 features and without using denoising methods.