Research on Fault Diagnosis of ZPW-2000K Track Circuit Based on RS-BN Algorithm

For problem of complex fault types and uncertain diagnostic features of the ZPW-2000K track circuit, traditional fault diagnosis mainly adopts manual diagnosis methods, which leads to low automatic diagnosis. This paper proposes a fault diagnosis method based on Rough Sets (RS) reduction model and Bayesian Network (BN) structure learning fusion. Firstly, data mining and feature extraction are performed on the fault data table, and expert knowledge is built into the prior knowledge base. Secondly, the K2 algorithm is used to train the fault feature quantity, and the BN model is built by combining the prior knowledge base. Then, a diagnostic decision table is established through the fault instance, and RS is used for attribute reduction, dimensionality reduction, and simplified model. The MLE algorithm is used again to learn the parameters to obtain the conditional probability table of the model, and the complete BN structure is established based on the RS-BN algorithm. Finally, the comparative analysis of the simplified model and the non-simplified model is carried out. Through the experimental simulation of the ZPW-2000K track circuit fault of a high-speed railway station, the accuracy and effectiveness of the diagnostic method are verified.


INTRODUCTION
ZPW-2000K non-insulated track circuit is based on the introduction and localization of French UM71 track circuit technology, combined with China's national conditions, and proposed a system with high security, high transmission and high reliability. It is important equipment for China's high-speed railway signal system, and it is also a key equipment to ensure the smooth and safe operation of railway high-speed trains and efficient transportation. Its main function is the open circuit inspection, occupancy inspection and data transmission of circuit. Its failure will seriously affect the transportation efficiency and lead to safety accidents. Therefore, accurate positioning and diagnostic fault are of great significance for the reliable operation of the train.
The ZPW-2000K track circuit has a complex system, high randomness and high uncertainty between fault causes and representations, which make fault diagnosis difficult. In recent years, fault diagnosis techniques such as expert systems, fault trees and neural networks have been continuously developed, and many scholars have applied this method to the fault diagnosis of track circuits. Literature [1] combined with decision tree and expert system, proposed a decision tree fault diagnosis system, but this method relies too much on expert experience and is easily misdiagnosed. In [2], the fault model of the ZPW-2000 track circuit and the fault phenomenon are explored, and the fault model of the system failure tree is established. However, it is extremely difficult to establish an accurate model. In [3], based on the working principle and fault characteristics of the track circuit, the FNN fault diagnosis model is established, but the neural network is easy to fall into local optimum.
The Bayesian Network (BN) is a hotspot in artificial intelligence research and has been successfully applied to many fault diagnoses [4][5][6]. Rough Sets (RS) is a classical theory for dealing with uncertainty. It can process and analyze incomplete data and has been widely used in many fields such as fault diagnosis, decision control, and pattern recognition [7][8][9]. Based on the advantages of BN and RS, this paper proposes a RS-BN ZPW-2000K track circuit fault diagnosis method. Firstly, data mining and feature extraction are performed on faulty instances, and BN data learning and parameter learning are implemented in combination with expert experience. Secondly, RS theory is applied to attribute reduction, dimension reduction and denoising, to obtain minimum diagnostic rules and establish optimal BN. Finally, the BN model was verified and analyzed by the fault instance of the ZPW-2000K track circuit of a high-speed railway station.

ZPW-2000K Track Circuit System
The ZPW-2000K track circuit is specifically designed for passenger dedicated lines and high-speed rail systems. Its structural composition includes: indoor equipment and outdoor equipment. Indoor equipment includes: transmitter, receiver, loss redundant controller and lightning protection analog network disk. Outdoor equipment includes: tuning matching unit, compensation capacitor, equipment connection line, air core coil and air turbulence transformer. Its system structure [10] is shown in Fig. 1.

THEORIES AND ALGORITHMS 2.1 Bayesian Network
BN combines graph theory and probability theory for uncertain reasoning and data analysis. BN = (G, P) consists of two parts: (1) directed acyclic graph G = (I, E), where I = {A1, A2, A3,…, An} is the set of nodes, E = {E1, E2, E3,…, En} is the set of edges, and the directed edges reflect the inter-node dependencies. (2) The conditional probability table (CPT) represents the prior probability of each node and describes the probability distribution of the nodes. Assuming that the BN nodes are A1, A2, A3,…, An, its joint probability distribution is as shown in Eq. (1).
The search-scoring method is a common method of BN structure learning. Accurate network structure can be obtained through data learning. Assuming that the sample contains n variables, the number of networks that exist through structure learning is as shown in Eq. (2). 1 ( 1) The complexity of the BN structure will grow exponentially following the number of nodes n, and the BN structure learning is also considered to be the NP-Hard problem [11]. This paper uses the classical K2 algorithm, as shown in Eqs.
where Bs represents the network structure and D represents the data. The parameter learning of BN is relatively mature, so the Maximum Likehood Estimation (MLE) is used to study the parameters of BN.

Rough Sets
RS is a classical mathematical theory that deals with fuzzy and inaccurate problems [12]. Its attribute reduction can eliminate redundant information, simplify conditional attributes, and generate minimal decision rules without changing decision-making ability. The theory consists of the following three parts: it indicates that a is unnecessary in R, otherwise it is a necessary attribute.

FAULT DIAGNOSIS OF ZPW-2000K TRACK CIRCUIT BASED ON RS-BN ALGORITHM
The fault data table used in this paper is derived from the track circuit monitoring warning information and the fault repair form filled out by the maintenance personnel. The fault data table is recorded in natural language and has no rules. Usually, the computer cannot process the data, so data mining and unified coding of the fault data table are required.

Establish a Diagnostic Knowledge Base
The establishment of the fault diagnosis knowledge base is traditionally based on expert experience. It is usually recorded in natural language and script, subjectively biased and not easily accessible. When a certain type of failure occurs, there is no experience to follow. In this paper, data mining and feature extraction are performed on the fault data table, and the diagnostic knowledge base is built by experts prior. Through data mining, the potential implicit relationship between fault points is found, which enriches the prior knowledge and reduces the dependence on expert knowledge. The establishment process is shown in Fig. 2

Establish a BN Model for a Priori Diagnostic Knowledge Base
Based on the prior diagnosis knowledge base, the fault node causal hierarchy of Fig. 3, and the fault node information of Tabs. 1 to 3, a BN diagnostic model of the ZPW-2000K track circuit is established based on the diagnostic knowledge base. The faulty node adopts discrete coding, and its state value takes values: 1-occurred, 0-did not occur. The model was built using Matlab's BNT toolbox, as shown in Fig. 4

Establish a BN Model Based on K2 Learning Algorithm
The K2 algorithm is a local search algorithm for data optimization. It combines the hill climbing search algorithm and the Bayesian scoring index to optimize the network model with high accuracy and excellent search efficiency. Because of the established fault data table, the information is relatively complete, so the K2 algorithm is used to mine the potential causal relationship between the fault points. Take the M1-indoor sender as an example, and use the Lean_Struct_K2() function to model, as shown in Fig. 5.

Establish a BN Model for Information Fusion
The BN model established by using the a priori diagnostic knowledge base or the K2 algorithm structure is not very accurate. The model based on the prior diagnosis knowledge base has a simple structure, ignoring the potential correlation implied by some faulty nodes, and there is an 'under-fitting'. The model learned through the K2 algorithm structure can deeply discover the potential correlation implied by some faulty nodes, but the established structure is complex, there is redundancy between faulty nodes, and there is 'over-fitting'. Therefore, the advantages of the two can be fully combined, and the two methods are merged to form a BN model based on the prior diagnosis knowledge base and the K2 algorithm structure learning, as shown in Fig. 6

Establish a BN Fault Diagnosis Model Based on RS-BN Algorithm
The ZPW-2000K track circuit is complex and uncertain, and the data is noisy, fuzzy, and random. The construction of the model is complicated and the number of nodes is large, which affects the efficiency and accuracy of modeling. The RS algorithm is used to eliminate redundant attributes, reduce kernel attributes, mine the simplest diagnosis rules, reduce the diagnostic scale and algorithm complexity, and improve the diagnostic efficiency. The attribute reduction process of the RS algorithm is shown in Fig. 7 Figure 7 The process of RS algorithm attribute reduction Fault mode S1 S2 S3 S4 ┅ S15 S16 S17 S18 r1 (2) Establish a difference matrix Define the difference matrix M(S) = [Mij]n×n, the value of mij is shown in Eq. (5).

M S S S S S S S S S S S S S S S S S S S S S S S S S
(3) RS attribute reduction Find and remove the single element in the difference matrix and keep the remaining element combinations. Combine the remaining elements with a single attribute element to get a simplified combination of attributes. Use the mutual information formula to calculate element dependencies, for example, calculate the dependency values of attributes P and Q, as shown in Eq. (7).
Calculate attribute combination dependencies and use the combination of minimum dependent values as the best attribute group. The optimal decision diagnosis rule is obtained based on the optimal attribute group, and an optimal diagnosis decision table is established. The condition attribute reduction in Tab. 4 is: { }, 2, 3, 4, 5, 6,7,8,9,10,11,12,14,15, 17 S S S S S S S S S S S S S S and the dimension of the fault feature points is reduced to 14, which reduces the complexity of the model. (4) Establish a BN diagnostic model based on RS-BN algorithm According to the optimal decision diagnosis rule and the information fusion BN model, a BN fault diagnosis model based on RS-BN algorithm is established, as shown in Fig. 8

Determine the Parameter Model of BN
The accuracy of parameter learning depends on the accuracy of the build model. After establishing the optimal BN model, it is necessary to determine the prior probability of each faulty node and establish a conditional probability table (CPT) of the faulty node. In this paper, the MLE algorithm [13] is used to study the parameters of BN. The BN model of the information fusion in Fig. 6 is defined as BN1, and the BN model of the RS attribute reduction in Fig. 8 is defined as BN2, and the prior probability of the faulty node is learned by GenIe2.0 software. The results are shown in Fig.  9.
It can be seen from the comparison of (a) and (b) in Fig.  9 that, in the case where the number of samples is the same, the reduced BN2 model and the unreduced BN1 model have the same conditional probability for the fault nodes obtained by the MLE parameter learning. It shows that model reduction cannot only simplify the model, but also get the same prior probability.

INSTANCE VERIFICATION OF FAULT DIAGNOSIS 4.1 Instance Verification 1
Select a fault data from the fault instance as a diagnostic example of the BN model, as shown in Tab. 5.  It can be seen from the diagnosis results of a) and b) in Fig. 10 that under the known evidences T1 and T2, the maximum failure probability diagnosed by the BN1 model and the BN2 model is R5 (receive level line error). It is consistent with the actual cause of the failure and verifies the accuracy of the model. For the BN2 model, the posterior probability of R5 is 0.9125, which is significantly higher than the 0.8875 of the BN1 model, while the probability values of R12 and R13 are decreased, indicating that the BN2 model is reduced by the attribute of RS, which improves the fault knowledge clarity and fault diagnosis ability better than BN1.

Instance Verification 2
For the 100 pieces of verification data of the ZPW-2000K track circuit, the fault is extracted according to the failure mode ratio, and the extracted data is shown in Tab. 6. The BN1 and BN2 models are diagnosed and inferred using the j tree_inf_engine () function in Full-BNT toolbox of BN. The comparison of the diagnostic results is shown in Fig. 12.
The comparisons of the fault diagnosis accuracy rates of the BN1 and BN2 model are shown in Tab. 7.
As can be seen from Fig. 11 and Tab. 7, the average diagnostic accuracy of the BN1 model is 89.33%, and that of BN2 is 93.33%. In particular, there are only 3 and 1 misdiagnosis data for M5, and the accuracy rate of fault diagnosis is 92.50% and 97.50%, which greatly verified the effectiveness of the two models. Although the diagnostic accuracy of M2 is relatively low, it has reached more than 82%. This is because the number of instances of M2 is 75. The lower number of instances leads to the diagnostic model of this module which is not very accurate. As the sample size continues to increase, the accuracy of fault diagnosis will continue to increase. The diagnostic accuracy of the BN2 model is higher than that of BN1. This is because the RS attribute reduction eliminates model redundancy and unnecessary attributes, eliminates interference and noise, and makes the model diagnosis rules more accurate and clearer. At the same time, the model structure is simplified, the misdiagnosis rate is reduced, and the diagnostic accuracy is improved. The BN2 diagnostic model is more efficient and has high likelihood.

CONCLUSION
(1) Through the full integration of ZPW-2000K system structure and expert experience, the fault prior diagnosis knowledge base and BN model structure are established. The potential internal hidden relationship between the fault points is mined by using the fault instance of the ZPW-2000K track circuit and the classic K2 algorithm learning BN structure. (2) The BN model established by the a priori diagnostic knowledge base and learned by the K2 algorithm is used for information fusion, and the advantages of the two are combined to establish a new BN model to further improve the accuracy of the diagnostic model. (3) We fully combine the advantages of BN and RS. RS theory is used to reduce the attributes of the initial decision table, reduce the dimension, eliminate redundant and non-nuclear attributes, reduce the model, and generate the simplest diagnostic rules to establish the best BN model structure. (4) Diagnostic analyses of the reduced model and the unreduced model are carried out by taking the actual fault of a high-speed railway station as an example. By comparison, the RS-BN fault diagnosis model proposed in this paper is compact in structure and efficient in diagnosis, with high reliability and high practical likelihood. It provides practical decisionmaking support for on-site electrical maintenance personnel to quickly and effectively diagnose track faults, which has practical significance for the development of fault diagnosis technology for ZPW-2000K track circuit.