Data Asset Management and Visualization Based on Intelligent Algorithm: Taking Power Equipment Data as An Example

: Data asset management is adequate in solving the problem of data silence and data idleness for enterprises. Through intelligent algorithms such as neural network, in-depth learning and block chain, and guided by business needs, it extracts, analyzes and visualizes the existing business precipitation data, and forms scattered and disordered data into valuable information to support the development of the company, so as to activate data assets. Taking the management data of electric power equipment as an example, this paper proposes a method of fusion of multiple intelligent control algorithms. The specific modules include the fusion of heterogeneous data; feature extraction of equipment asset management data based on machine learning; intelligent control of multi-objective optimization environment based on energy consumption data; BIM data visualization based on data classification-energy extraction-neural network (SVM-CART-SAE-DNN) algorithm fusion. The algorithm can effectively improve the efficiency of equipment management and enhance the security and economy of power infrastructure through intelligent control of equipment management.


INTRODUCTION
As an intangible asset, data is an important strategic resource for enterprises. For now, the information system construction process of State Grid Corporation of China mainly adopts the "chimney" architecture, which has high construction cost, long cycle, non-sharing of data, low quality, a large number of data deposited in the background of the system, without in-depth mining and application, on the one hand, it occupies tremendous system resources. On the other hand, it does not provide useful decision support for enterprise management, resulting in great waste. Taking the inspection data of power space equipment as an example, this paper introduces the method of data asset extraction, analysis, and display based on artificial intelligence method to aggregate, reconstruct and unify the supervision of data.
As a comprehensive infrastructure, the power space equipment system has many internal risk factors and complex risk types. Once an accident occurs, there will be a significant impact on urban public safety. The health management of power space equipment in China still stays at the stage of regular inspection by appointed personnel, which cannot predict the risk in advance, let alone prevent it from happening. Prognostics and health management (PHM) based on data, as a key method to ensure the safety and reliability of equipment, has achieved fruitful theoretical results and been widely used in practice in the past decades. However, the health management of power space equipment in China is still in its infancy. Problems such as small data dimension and immature algorithm research need to be solved urgently. The management of equipment in power space only relies on a series of manual operations such as staff patrol inspection, fault detection and repair. The operation and maintenance of simple project-level and company-level power space can still operate normally, but the following serious drawbacks are exposed under the city-level power space supervision system: Firstly, in the city-level super-large-scale power space environment, manual inspection requires huge manpower and material resources, and it is difficult for managers to dispatch materials and personnel. Second, inspectors can only find problems when the equipment fails.
The operation management research on electric power space equipment started earlier in the USA. On the basis of collecting a large number of data, Hiromitsu et al. analyzed and determined the hazards and risk categories existing in electric power space, and put forward preventive measures and methods for different hazards and risks [2]. Farhad et al. classified and summarized that power space disasters mainly include fire, power outage, explosion, etc., and designed different countermeasures and preventive methods for different types of disasters [3]. The environmental and psychological problems are further studied, and the importance of safety monitoring and safety management in the power space is put forward. The research method on the operational management of power space security risk includes automatic control etc. [4][5]. Van et al. studied the security of power space. The existing problems are summarized, including economic problems, development problems, cost problems, environmental problems, regulatory issues, etc., and the lack of successful power space safety management experience is highlighted [6]. According to Klepikov's research, human health, psychology, safety, and other factors should be considered comprehensively to improve the utilization rate of power space equipment [7]. Yoo et al. developed a tunnel risk assessment system based on information technology (IT).
With analyzing the potential risks of the tunnel by GIS and AI, the system is developed in the geographic information system (GIS) environment [8]. Canto-Perello futher studied the risk assessment of personnel entering the system and analyzed the potential risks.The result is that accessibility and maintainability are the key points which distinguish power space from other public facilities [9]. Rogers studied the air pollutant diffusion standard and the design and development of space ventilation systems [10].
With regard to the conflagration in power space, some scholars carried out research on Fire Characteristics of Urban Power Space Based on CFD Simulation Analysis. Under the condition of setting fire scale, they analyze the influence of different fire compartment lengths and different ventilation velocities on fire characteristics in detail [11]. On this basis, the optimal design length of fire compartment and ventilation velocity for smoke control of such facilities are proposed [12]. Zhang analyzed the types and characteristics of fire risk in power space and proposed practical measures to reduce fire risk in power space [13]. Yue made a systematic study of urban power space with the aspects of seismic response analysis and seismic reliability [14].

RESEARCH METHOD
The existing research on the safety risk of urban power space equipment mostly focuses on the accident risk of single equipment. It fails to predict the composite and coupling risk of power space and needs to consider the impact of multiple factors on the risk of power space. With the development of Internet of Things (IoT), collecting and analyzing data outside the power space is possible. With the perspective of equipment failure rate (EFR) feature extraction and multi-objective optimization environment intelligent regulation and control, this paper focuses on a set of power space environment intelligent regulation and control system based on EFR. The detail is shown in Fig.  1.
Since support vector machines and neural networks are black-box models, features cannot be selected directly during model learning, while decision tree models have some feature selection capability in model learning. In this paper, the decision trees are combined with support vector machines and neural networks. By building a decision tree, considering the importance of features, extracting features using the height of the tree nodes, and feeding the features with higher classification ability into the support vector machine and neural network models for training, the factors affecting the equipment lifetime and the mechanism of influence are extracted and analyzed. Multi-objective planning is a branch of mathematical planning. Multiple objective functions are considered to be optimized over a given domain. It is often called MOP (multi-objective planning). The planning decision-maker can propose an expected value (or satisfaction value) for each objective function; the solution to the problem is selected by comparing the deviation between the actual and expected values. In this paper, we extract the characteristics of factors affecting EFR based on environmental data and adjust the energy consumption data by combining various environmental indicators.

PRE-PROCESSING OF POWER SPATIAL DATA 3.1 Data Description
The power space is exposed to complex accident risks, including high temperature, high humidity, water accumulation, smoke, and other unfavorable conditions. Therefore, it is required to collect the real-time data of vibration, acoustic and optical signals, temperature, humidity, water level, toxic and hazardous gas concentration, oxygen concentration, smoke particle concentration, and inspection and maintenance by maintenance personnel in the tube. The environment data is shown in Tab. 1.  The independent variable is a time series panel data of spatial environmental indicators, including equipment failure information, while the equipment failure information in the power space is manually input at this stage, and the content is mostly unstructured text and picture information. In this paper, multi-source data fusion technology based on data pattern matching is applied to preprocess multi-source data ( Fig. 2 and Fig. 3). With the rapid development of IoT, the Internet of Everything has become an inevitable trend. IoT involves smart homes, smart transportation etc., enabling people to connect with any person or device anytime, anywhere. However, most of the heterogeneous data in IoT is stored in silo. It hinders the pace of the Internet of Everything. Data pattern matching is widely used in data association, which can solve the above problems.

Multi-Element Data Fusion 3.2.1 SVM Text Classification
As shown in Fig. 2 and Fig. 3, the SVM text classification algorithm is used to preprocess the numerous and complex employee data. Text classification is a classic topic in natural language processing, and free text files need to be assigned predefined categories. To date, almost all text classification techniques are based on words, where simple statistics of ordered combinations of some words, such as n-grams, usually perform best [15]. SVM can transform the original space into a linear problem of highdimensional space through nonlinear transformation, and choosing the appropriate kernel function is crucial to SVM [16].
1.Text feature extraction and representation: At present, in order to simplify the feature selection process when extracting text features, the assumption of feature independence is often adopted to achieve a compromise between computing time and computing quality. The general method is to select the best features as a subset of text features by setting feature thresholds according to the feature vectors of words in the text, and then build a feature model. Thus, we use the TF-IDF formula to calculate the weight of a word.
where tf ik represents the frequency of the occurrence of the word t i in the document d i at the special diagnosis time t k , N represents the total number of training documents, and n k represents the number of documents in which the word ti appears in the training set. Thus, the higher the frequency of a word in a batch of documents, the lower the degree of discrimination and the lower the weight. 2.Normalization processing: Normalization is to limit the data to a certain range in the process of algorithm. As we can see, denotes the word frequency of a keyword, min and max respectively denote the minimum and maximum word frequency of the word in all texts. Normalization helps to make text classification more accurate. max min min 3.Text categorization: After text preprocessing, feature extraction, feature representation and normalization, the original text information is abstracted into a vectorized sample set, and then the similarity is calculated between the sample set and the trained template file, and if the sample set does not belong to a category, the similarity is calculated with the template file of other categories until the sample is classified into the corresponding category. Fig. 4 depicts how the SVM model classifies the text.

Pattern Matching Based on Cabin and Time
Pattern matching algorithms measure the similarity of pairs of elements from different data patterns in a feature and output a real value in the interval [0, 1] as the similarity of two input elements from different data patterns. After investigation, this paper summarizes the classification of basic pattern matching algorithms [17]. As shown in Fig.  5, the processing of element pairs in two input heterogeneous data patterns consists of three main stages, including similarity calculation phase, similarity synthesis phase, similarity determination phase.

FEATURE EXTRACTION FOR EFR 4.1 CART Model
CART is Classification and Regression Tree. CART assumes that the decision tree is a binary tree, and only "yes" and "no"are allowed in the values of the internal node characteristics. The left branch is the branch with the "yes"value, and the opposite is true if there is a branch. This decision tree is equivalent to recursively bisecting each feature [18][19][20]. With k classes, the Gini coefficient of the probability distribution and the corresponding binary classification problem are defined below.
In the decision tree problem, the principle of feature selection using Gini coefficient is as follows: The minimum binary regression tree generation algorithm for regression tree generation is as follows: (1) Select the optimal segmentation variable j and the segmentation point s, traverse the variable j, scan the segmentation point s for the fixed segmentation variable j, and select the pair (j, s) that minimizes the above equation. Where R m is the divided input space, and c m is the output value corresponding to the space R m .
(2) Divide the area with the selected pair (j, s) and determine the corresponding output value.
(3) Continue to invoke step 1 for both subareas until the stop condition. Divide the input space into M regions R 1 , R 2 , ..., R m generates a decision tree.
With k classes, the Gini coefficient of the probability distribution is defined as: The probability distribution Gini coefficient of the binary classification problem is: In the decision tree problem, the principle of feature selection using Gini coefficient is as follows:

Influencing Factors Analysis
This phase describes the specification of the tree structure [21][22][23]. The maximum tree depth is set to 3 levels and the saliency values for split nodes and merge categories are set to 0.05. The minimum number of cases for the parent node and the child nodes is set to 100 and 50 respectively. The results of the model are shown in Fig. 6 for analyzing the relationship between the analysis of equipment life impact factors and the predictive/categorical variables. The final tree structure for equipment life involves five split variables, including temperature, humidity, methane, oxygen and hydrogen sulfide. The first best split for node 0 is based on methane content, which is likely to cause a sharp increase in EFR if it exceeds 2, followed by hydrogen sulfide, which has a greater impact on EFR, with about 1/3 of equipment failures being related to hydrogen sulfide. In addition, humidity and temperature also have some influence on the EFR, but they are not the main factors. When the humidity is too high and the temperature is too high, it is more likely to cause equipment failure.

EQUIPMENT FAULT PREDICTION BASED ON NEURAL NETWORK 5.1 Model Establishment
The intelligent regulation of the power space environment is a highly complex nonlinear system, so is its design optimization. In order to simplify the problem analysis and grasp the main factors affecting the problem, the optimization model of intelligent regulation and control system of power space environment is constructed based on the following assumptions: (1) The impact of events on the power space environment environmental indicators is instantaneous. The failure rate of power space control equipment n is affected to some extent by the environmental indicators, such as turning on the fan to reduce humidity. w ij is the duration of initiating loop control i. Generally more than one environmental control operation is required to control an environmental indicator at the same time. In order to make the total energy consumption y 1 and the failure minimization equipment rate y 2 , we start environmental control i for 01 variable xi and environmental control i for time T i ,. Fig. 7 shows the model of intelligent regulation system of power space environment.
The stack-based self-coding (SAE-DNN) training and prediction process consists of three phases. In the first stage, we use a purely unsupervised approach to pre-train the SAE of the greedy layer by combining different features into different training datasets. The SAE is used to initialize the weights and biases of the DNN. In the second stage, the labeled data are used to supervise the DNN final stage prediction. In this paper, a deep neural network based on the SAE-DNN model is built to identify suspicious anomalies [23]. As shown in Fig. 8, feature vectors can be extracted and the trained weights are fed into the DNN, which improves the accuracy of suspicious anomaly detection. Eq. (13) is used to minimize the reconstruction error between the input vector X and the reconstruction vector R.
In order to match the reconstruction result R with the input vector X, minimize the loss function L(X, R) by finetuning the parameters w i and b i .
where w i and w' i are the original weight and the updated weight in each i node of the hidden layers; b i and b' i are the original bias and the updated bias in each i node of the hidden layers; η is the learning rate.

Experimental Results
The evaluation indicators are as follows. Where y i is the real value of the sample, and i y is the corresponding predicted value; T is the number of all samples.
Tab. 2 compares the results of the method in this paper with those of the classical machine learning method. It can be seen that the method in this paper is obviously superior to the conventional machine learning method. The RMSE of the final result is 50.69, and the MAPE is 0.1004, which has better fault diagnosis performance.
Tab. 3 compares the prediction method of application environment and operation data with the prediction results of single application environment and operation data, and the results show that the combined application of the two can obtain better prediction results. Tab. 4 shows five sets of test results.

Algorithm Performance Analysis of Solution
The statistical analysis of the Pareto frontier convergence and the average population diversity of the continuous swarm intelligence algorithm selected in the RERFTRD example for 30 independent experiments, and the running time and number of iterations for 30 independent experiments are shown in Fig. 9. The statistical analysis of SC and ISC of the continuous swarm intelligence algorithm in the RERFTRD example over 30 independent experiments shows that the particle swarm optimization algorithm with quantum behavior achieves better optimization results than other algorithms, while the IMOQPSO algorithm achieves the best optimization results. Second, the multi-phase particle swarm optimization algorithm, IMOMPPSO algorithm also achieves better optimization results than the original algorithm. However, the BBMOPSO algorithm achieved the worst optimization results.

VISUALIZATION OF DATA ASSETS
The data of this study comes from the power alarm data recorded by the platform, and the original data of the platform is stored in the Oracle database. The time dimension of the alarm data is in seconds, the attributes of the data include the ID of power alarm event (ID), power alarm device ID (DEV_ID), power alarm event name (EVT_NAME), power alarm time (ALM_TIME), power alarm event level (ALM_LEVEL), and power alarm content (ALM-CONTENT), Operation status (OP_STATE), operation time (OP_TIME), operator (OPERATOR) and ID (ROWID) of the data in the Oracle database. Some examples of raw power alarm data are shown in Fig. 11.

Figure 11
Example of partial raw alarm data Through the observation of power alarm data, it is found that the original data is not complete and there are missing values. Whether missing values can be handled properly or not plays a vital role in data asset management. If the data containing missing values are deleted directly, the judgment of the probability of the occurrence of electric power risk accidents and the judgment of the size of adverse effects once the risk occurs will inevitably have a great impact, the results will be very different from the actual results, and it will also cause a negative impact on the follow-up risk warning and equipment failure prevention work, thus causing the whole risk management work cannot achieve the purpose of risk avoidance or risk reduction. Once the risk evolves into an accident, it will have an irreversible impact on human, material and financial resources. So, it is very important to deal with the missing values in the original data reasonably and apply appropriate and accurate methods to fill in the missing values.
Firstly, the power data is preprocessed, and three useful attributes for the study of equipment fault metrics are screened and separated, which are power alarm event name (EVT_NAME), power pipeline name (LINE_NAME) and power alarm event level (ALM_LEVEL). The power line name is a column separated from the power alarm content (ALM_CONTENT). Power alarm event level is divided into level 1, 2 and 3; there are ten kinds of power pipeline names, namely, entrance and exit (CR), thermal power in the park (RL), gas in the park (RQ), water, information and electricity in the Park (S), equipment room in the Park (SB), natural gas in Baikang Road (T), comprehensive Baikang Road (Z), electric power in Baikang road (D), equipment room in Yankang Road (YSB) and comprehensive Yankang Road (YZ). According to the location of the power park, there are four parts: entrance and exit, park, Baikang Road and Yankang Road. The names of power alarm events are divided into O2_OVERLIMIT, CH4_OVERLIMIT, PRSN_INTRUSION, HUM_OVerLIMIT and TIMEOUT.
An example of partially preprocessed data is shown in Fig.  12: Figure 12 Example of partially pre-processed data A typical power failure analysis method consists of four steps, including hazard identification, frequency analysis, consequence analysis and risk quantification. On the basis of collecting the necessary information about power failure to determine the failure mode, risk factors and clear causal relationship, this study obtains the probability of main events and uses dynamic variable weight fuzzy Petri nets to carry out risk reasoning. Initialize the possible result of the event, calculate the relevant consequence probability based on the expected arrival time method, quantify the risk loss using the data classification-fault extraction-neural network algorithm, conduct risk assessment, make risk decisions, and formulate control measures (Fig. 13).
Based on the OnceDI system, a publish/subscribe system, OnceDI/PS, is developed to support data distribution from multiple power data sources. The system consists of two parts: the proxy server and the integrated agent. Its architecture is shown in the figure. The proxy server system employs a hierarchical, configurable, and scalable architecture. According to the scalable design requirements of the power publish/subscribe system, the main functions of the proxy server of the power system adopt a hierarchical structure, the system is divided into six layers from top to bottom, including publish/subscribe interface layer, data model and semantic conversion layer, matching layer, routing and scheduling layer, overlay network layer and transport layer. Each layer builds on the functionality provided by the lower layers and provides a clearly defined interface to the upper layers. The layers are independent of each other, and the implementation of each layer is easily replaced by a different implementation.
The publish/subscribe interface layer provides API interfaces for publishing metadata, events, subscriptions, and notifications of the Pub/Sub middleware. The data model and semantic conversion layer are responsible for maintaining the metadata model, semantic event model and semantic subscription model, realizing the semantic conversion of data, managing the metadata and subscription of the system, and maintaining the common vocabulary and mapping function library of the system. The matching layer carries out effective event matching through a matching engine; through the efficient event matching algorithm, all subscriptions satisfied by the event can be found quickly. The routing and scheduling layer is responsible for the forwarding of metadata, subscription and publication data, etc. The overlay network layer is a virtual communication structure logically above the supporting network layer; the main task of this layer is to maintain the topology of the proxy server network, handle the status update information of the overlay network, and deal with the dynamic changes of the overlay network nodes such as joining and leaving. The system provides the realization of acyclic graph and Mesh network. The transport layer is a unicast communication service that represents the underlying network and provides reliable message delivery between proxy servers. In addition, configuration management is the auxiliary facility of the system, which realizes the system configuration of the server, and configures the transmission protocol, overlay network structure, routing strategy, scheduling algorithm and event matching algorithm according to the application and system requirements.
The proxy server subsystem of the OnceDI/PS system adopts a layered modular design. Standardized interfaces between layers limit the impact of code changes to the layer being changed to only one layer, the layer being changed can be accommodated without changing other layers, independent layer implementations are easily replaced by semantically equivalent implementations. This improves the local dependency, portability, and replaceability of the system.
The BIM-based power equipment operation and maintenance management visualization system is mainly used for the visual display of power equipment information. It is a tool for visual management and can realize the query and statistics of various data of power equipment. In addition, all units involved in operation and maintenance management can access and query relevant information in real time through the platform. The authorized person can also delete, add and update the information in the system. Firstly, BIM software is used for 3D modeling of power equipment, and the basic information, maintenance information, cost information and contract information of the equipment/pipeline are added to the attributes of each corresponding equipment/pipeline. At that same time, the information of each sensor of the monitor system is input at the corresponding position, including fire information, environmental information, safety information and equipment information. And when a certain position is in Dan, that video information of the position can be automatically called. The above information is integrated through the BIM system to form an information-integrated BIM system. A manager can accurately position any structural part or equipment/pipeline through that visualization platform. Take a 360-degree view of what you want to know. For the equipment/pipeline in the power system, in addition to obtaining the appearance and spatial location information, the platform covers different colors or patterns for the relevant fault parts to express their current different States (operation and maintenance state, cost control state, contract dispute state, etc.). Use this intuitive way to assist managers in management and decision-making.
The application architecture of BIM-based power equipment operation and maintenance management visualization system is shown in Fig. 14

CONCLUSIONS
As a crucial issue of electric power, equipment health management is studied with the perspective of big data in this paper. To achieve optimal control under the condition of lowest EFR and lowest energy consumption, the intelligent control system of power space environment based on EFR is designed from the perspective of multidata fusion, EFR feature extraction and multi-objective optimal control. The research results help to realize multilevel centralized management, constant monitoring, intelligent warning and collaborative emergency response in urban power space. Although this paper integrates multisource data from spatial management departments, there are still shortcomings in data dimensionality, and the main variables used for analysis are only six spatial environmental indicators. If the operational and environmental data of the data itself can be obtained, more effective and accurate characteristics of the EFR can be extracted. This paper relies mainly on single power spatial data. The accuracy of failure rate feature extraction cannot be guaranteed due to the short operation time and little equipment failure information, but the research system in this paper will be improved as time goes on.