Link Prediction based on Deep Latent Feature Model by Fusion of Network Hierarchy Information

Link prediction aims at predicting latent edges according to the existing network structure information and it has become one of the hot topics in complex networks. Latent feature model that has been used in link prediction directly projects the original network into the latent space. However, traditional latent feature model cannot fully characterize the deep structure information of complex networks. As a result, the prediction ability of the traditional method in sparse networks is limited. Aiming at the above problems, we propose a novel link prediction model based on deep latent feature model by Deep Non-negative Matrix Factorization (DNMF). DNMF method can obtain more comprehensive network structure information through multi-layer factorization. Experiments on ten typical real networks show that the proposed method has performances superior to the state-of-the-art link prediction methods.


INTRODUCTION
LINK prediction for complex networks is the research hotpot in recent years and it helps us to explore and understand the evolution mechanism of the complex networks. The objective of link prediction is to predict unobserved links from existing parts of the network or forecast future links from current structures of the network.
At present, the existing link prediction methods for complex networks can be divided into two categories: similarity-based methods and probabilistic methods. Similarity-based methods consider that links between two nodes with more similar ones are of higher existing probability, such as Common Neighbors (CN) [1], Adamic-Adar (AA) [2], Cannistraci-Resource-Allocation (CRA) [3][4]. They rely on the network topology and have the shortcoming of limited prediction ability. Probabilistic methods [5] assume that the network has a known structure, calculating connecting probability for edges between unobserved nodes on the basis of model building and parameter estimation, such as Probabilistic Relationship Model (PRM) [6], Hierarchical structure model (HSM) [7], Stochastic block model (SBM) [8]. Although probabilistic methods have many advantages in network analysis, they have the disadvantages of timeconsuming [9].
Besides the above link prediction methods, some novel methods have been proposed. According to the consistency of structural feature of a network, Structural Perturbation Method (SPM) based on perturbed eigenvectors was proposed and was applied to solve link prediction problem [10]. Low-Rank (LR) method based on robust principal component analysis and sparse property of the adjacency matrix of network was proposed to predict the missing edges in a network [11]. Methods are based on nonnegative matrix factorization by kernel function including Lineal Kernel (LK) and Covariance Kernel (CK) for network reconstruction and link prediction [12].
Relationship between the nodes in complex networks not only depends on network topological information, but also depends on the latent properties and features of network nodes which cannot be observed directly from networks. Therefore, latent feature model is widely used to predict the potential connections in network analysis and link prediction [13][14][15][16]. The latent feature model is used to express network nodes by direct projection of network nodes into latent space. The key idea of latent features model is to map the features of the original problem into the latent feature space with less dimension.
If we constrain the elements in two matrices to be nonnegative, we can obtain the corresponding solution by nonnegative matrix [17][18][19][20]. The basic idea of the Nonnegative Matrix Factorization (NMF) method is to decompose a non-negative matrix into two low-rank nonnegative matrixes. The matrix factorization method cannot only extract the latent features, but also itself is a method of reducing dimension [14,21]. For example, Shin et al. [22] proposed a multi-scale link prediction method based on the clustering method of low order approximation. The latent feature model based on non-negative matrix decomposition is widely used in link prediction. The results show the latent feature model based on nonnegative matrix decomposition can find potential structure of network relations between entities, has strong explanatory of network information, and can automatically learn from latent features and has good adaptability and extensibility. Although different NMF-based methods have good performance on some networks, they still cannot fully characterize the deep structure information complex networks.
Most big real-world networks are very sparse, the average degree of the network is much smaller than the number of nodes, and the number of observed edges is much smaller than the maximum possible edges in the network. Due to the limited availability of information and the network sparsity, it is very difficult to get good performances using the traditional link prediction methods. Therefore, this paper proposes a novel link prediction model based on Deep Non-negative Matrix Factorization (DNMF) by fusion of network hierarchy information. Firstly, a hierarchical network structure learning model is formed by decomposing the coefficient matrix many times. Then, unsupervised learning tactic, which has been used successfully on autoencoder networks [23], is adopted as the training method, multiple-layer factorization is as predecomposition results and then the basis matrices and the coefficients matrix can be adjusted as fine-tuning result. Finally, the similarity matrix is calculated according to the fine-tuned basis matrix and fine-tuned coefficients matrix. This model can guarantee the expression of hierarchy structure information on real networks, at the same time, and can get much richer and more comprehensive potential feature information, and improve the prediction accuracy of link prediction.
In conclusion, the contribution of this paper is: 1) On the basis of non-negative matrix factorization, multilayer factorization is applied to latent feature model, the hierarchical structure information of a network can be learned by multi-layer factorization. 2) We learn from the unsupervised learning strategy of the autoencoder network and adopt the two-stage including pre-training and fine-tuning for link prediction. 3) Similarity matrix can be obtained by a group of basis matrices and the coefficients matrix.
The remainder of this paper is organized as follows. First, we introduce problem statement and present proposed methodology in Section II. We then give the experimental evaluation metrics, experiment data and experimental results in Section III. Finally, we conclude the paper in Section IV.

PROBLEM STATEMENT AND PROPOSED METHODOLOGY 2.1 Problem Definition
As we know, a network consists of nodes and edges. Given an undirected network G = (V, E), where V and E represent the set of nodes and set of edges respectively, N = |V| and M = |E| represent the number of nodes and edges of the network respectively. The network can be expressed by an adjacency matrix A, where the size of A is N × N, where Aij = Aji = 1 if there is a connection between node i and node j otherwise Aij = Aji = 0.
In order to verify the performance of the proposed method for link prediction, the observed links are randomly divided into a training set Etrain and a testing set Etest, where Etrain ꓴ Etest = E and Etrain ꓵ Etest = Ø. Here, training set Etrain is used to establish prediction model while testing set Etest is only used to verify the accuracy for link prediction in complex networks. Atest and Atrain represent the adjacency matrix of the training set and the testing set respectively, all elements of them are 1 or 0, where Atrain + Atest = A. Let L = | Etest | represent the number of edges in the test set. So, the number of training set edges is |Etrain| = M -L. Except for the training set, the number of all possible edges in the network are regarded as the candidate set, which is Then, the prediction model is studied from the training set Etrain and the probability value of each possible edge is calculated, and the results of the test set Etest are verified according to different evaluation metrics.

Link Prediction Based on Deep Latent Feature Model 2.2.1 Non-Negative Matrix Factorization
Non-negative Matrix Factorization (NMF) is a matrix factorization algorithm which is a recent method for making the latent structure in data more explicit and reducing its dimensionality [24].
Given adjacency matrix of a network A ∊ R M×N which can be well approximated by two non-negative matrices W ∊ R M×k and H ∊ R k×N such that: (1) In order to quantify the quality of the approximation, the cost function with the square of the Euclidean distance can be written as follows: (2) where, W and H represent the basis matrix and the coefficients matrix respectively. According to iterative update algorithm [24], the iterative algorithm minimizing the objective function O in Eq. (2) is as follows:

Deep Non-Negative Matrix Factorization
Based on the decomposition of non-negative matrix, this paper proposes an algorithm named Deep Nonnegative Matrix Factorization. Through the multiple factorization of the coefficients matrix, the multi-layer structure information of the network is fused, and its factorization schematic diagram is shown in Fig. 1.
The Deep NMF forms a multi-level network structure learning model through the multiple factorization of the coefficients matrix H. The factorization steps of H are as follows: Step 1: we first factorize the network adjacency matrix ; R represents the real number field.
Step 2: Following Step1, the coefficients matrix H1 can be factorized to H1 ≈ W2H2, where W2 ∊ R k1×k2 and H2 ∊ Step 3: By analogy, after m times of factorization, the network adjacency matrix A ≈ W1W2W3, …, WmHm, where W1, W2, …, Wm, Hm are non-negative. Wm ∊ R km−1×km , Hm  After m times of factorization on the coefficients matrix H, it can be expressed by m + 1 factors multiplied, including m basis matrices and a coefficients matrix. Each additional basis matrix which is added is equivalent to adding an additional layer of abstraction to automatically learn the network hierarchy information and explore the latent features more accurately and comprehensively. The loss function of Deep-NMF can be expressed as:     (5) where, W ≥ 0, H ≥ 0. In Eq.  T  T  T  T  T  T  1  2  3  1  1 2 3   1   1  1  1  1   1  1   T  T  1 2 3  1  T  T  T  T  T  1  2  3  1   Tr(  2  , … The optimization objective function based on non negative matrix factorization is a non-convex optimization problem, and its prediction results depend on the initial value of the basis matrix W and coefficients matrix H. Traditional non-negative matrix factorization methods tend to be random initializations W and H, but it is easy to get into the local optimal solution, which may also result in under fitting phenomenon. In order to improve the generalization ability of the proposed method, we draw from the unsupervised learning strategy of auto encoder network [25], so the two-stage including pre-training and fine-tuning is adopted for link prediction. (1) Pre-training stage Step 1: we first decompose the network adjacency matrix A ≈ W1H1, where W1 ∊ R N×k1 and H1 ∊ R k1×N ; Step 2: Following Step 1, the coefficients matrix H1 can be decomposed to H1 ≈ W2H2, where W2 ∊ R k1×k2 and H2 ∊ R k2×N ; Step 3: Continuing to do so until all of the layers have been pre-trained, the network adjacency matrix A ≈ W1W2W3, …, WmHm, where W1, W2, …, Wm, Hm are nonnegative.
(2) Fine-tuning stage In Eq. (6), the partial derivatives of CDeep_NMF with respect to Wm and Hm is as follows:   T T  T  3  2 1  1 2 3  1  T  T T  1 2 3  1  T  T T T  1  3  2 1  1 2 3  T  1 , …, Deep_ T  T T  1 2 3   T T T  3  2 1  1 2 3  T  T  T T  1 2 3  3  2  T  T  1  1 2 3  T T T  T  3  2 1  1 2 3  1  T  T  T T T  1 2 3  3  2 1   1 2 so, the Eq. (7) and Eq. (8) can be rewritten as follows: eep_  T  T  1 2 3   T  T  T  3  2  1  1 2 3  T  T T   2  , …,  2  , According to the literature [18], we can get the following multiplication updating rules for Wm and Hm: (3) Predicting links using Deep Non-negative matrix Factorization Inputting a network data, the proposed algorithm for link prediction has three steps. Firstly, we get the number of latent features of the original network adjacency matrix Aby Colibri method. Secondly, DNMF is used to find two non-negative matrix factors W and H. Thirdly, the network can be reconstructed by W and H to make the final prediction (Algorithm 1).
Algorithm 1: The framework for the proposed algorithm with network hierarchy information input: Given the network adjacency matrix A, the proportion of training set f and layer number m. output: The similarity matrix of the network A * .

EXPERIMENT AND COMPARISON 3.1 Evaluation Metrics
In this work, in order to verify the performance of the proposed method, three evaluation metrics are used to compare the performance of the proposed method and the baseline methods. Three evaluation metrics which include AUC, Precision and Prediction-Power (PP) are defined as follows: (1) AUC [26]: AUC is general evaluation metrics, it means the area under curve for the receiver operating characteristics (ROC) analysis. Given the top L links as predicted links, a ROC curve is obtained by plotting true positive rates versus false positive rates for varying L values. Thus AUC can be interpreted as the probability that a randomly chosen missing link has a higher score than a randomly chosen non-existent link in the rank of all non-observed links. In algorithmic implementation, if among n times of independent comparisons, there are n' times in which the score of the missing link is higher than that of the non-existent link and nJJ times in which the two have the same score, then AUC can be defined as follows: ' 0.5 '' n n AUC n If all the scores are generated from an independent and identical distribution, AUC will be approximately 0.5.
(2) Precision [27]: Given the ranking of the non-observed links, the precision is defined as the ratio of relevant items selected to the number of items selected. Precision can be defined as: where, L represents the size of the predicted links, Lr represents the size of correctly predicted links. Clearly, higher precision can denote higher accuracy.
(3) Prediction-Power (PP) [3]: In order to characterize the difference between the proposed prediction algorithm and random prediction, literature [3] puts forward the prediction ability evaluation which is used to evaluate the overall predictive effect for link prediction methods. The higher Prediction value can denote higher Prediction effect. Predictive Power (PP) is defined as: 10 Random 10 log Precisi P on Precision P = × (17) where, PrecisionRandon is the precision value of the random prediction, it means that predicting edges are ranked randomly. The average random prediction accuracy is approximately equal to ( ) ( ) and M represent the number of nodes and the number of edges respectively in the network.

Baseline Methods for Comparison
In order to verify the performance of the proposed method DNMF, we compare our proposed method with fourteen state-of-the-art link prediction methods for performance comparison.

Experiment Data
In order to verify the performance of the proposed method, we consider the following 10 real world networks: Jazz, a network of jazz bands [33]; NS, a network of coauthor-ship between scientists working on network theory [34]; PB, a political blogs network of hyper-links between weblogs on politics [35]; Power, the network representing the topology of the power grid of US [36]; Router, a network of internet route [37]; SmaGri, a network of citation on network theory and experiment [38]; USAir, a network of USA airlines [38]; Yeast, a network of proteinprotein interaction on yeast [39]; Karate, a social network of individuals of a karate club [40]; School, a friendship network in a high school [41]. Tab. 2 provides topological features of the ten realworld networks. Where, V and E represent the set of nodes and set of edges respectively. LD and (K) are link density and the average degree; APL and C are the shortest distance and the average closeness for all the pair nodes of the network, CC and r are clustering coefficient and the degree-degree correlation coefficient respectively. LCPcorr represents the correlation coefficient between LCP (Local Community Paradigm, LCP) and CN [3]. H denotes the degree heterogeneity.

Experimental Results
In order to test the performance of the proposed method, we compare the proposed method with fourteen well-known methods on 10 real networks. The observed links are randomly divided into a training set and a test set. Here, training set is used to establish prediction model while test set is only used to verify the accuracy for link prediction in complex networks. As represented in Tabs. 3 to 5, the performance on the ten real world networks is shown based on AUC, Precision and PP, respectively. The largest value in each column is represented in bold face.
We compared our methods (DNMF) with other methods on the 10 network data sets and the AUC values are returned with the average over 100 runs. In our experiments, set α = 0.0001 for LP, parameter α = 0.01 for Katz, m = 2 for DNMF, η = 0.1 for SPM, γ = 0.15 for LR. For each data set, the observed links are randomly divided into training set (90%) and test set (10%).
As shown in Tab. 3, DNMF is better than traditional NMF. Furthermore, DNMF has the best AUC values on several real networks, including PB, SmaGri, Yeast and School. AUC values of our proposed method are very close to the highest ones on the other networks.
As shown in Tab. 4, DNMF has better precision values than traditional NMF as a whole. DNMF has the best precision values on several networks including PB, Power, Router, USAir and Yeast. On other networks, such as Jazz, Karate, DNMF has the second best precision values. Under precision metric, the traditional methods do not perform well on sparse networks, such as Router, PB, Yeast, while DNMF performs much better. This indicates that DNMF is superior to the traditional NMF and other classical methods, especially on sparse networks, such as router, PB, Router, Yeast, Power, etc.
Tab. 5 shows a comparison of the prediction accuracy measured by PP on ten typical real-world networks. The mean value of PP of each method across all the networks is shown at the last column and it is an indicator of average performance. Different methods are presented in increasing order of mean PP. As seen from Tab. 5, DNMF has the best overall performance and SPM has the second best overall performance. In overall performance aspect, DNMF is better than CK and LK, which indicates DNMF can extract more useful and richer organization of features hidden in the original network.
To accurately test our proposed method, we analyze the experimental results on the six networks with different fraction of training set from 0.3 to 0.9. As shown in Figs. 2 to 4, we show the results of six networks based on AUC, Precision and PP, respectively. The results are returned with the average of over 100 runs. The six networks are Yeast, Jazz, PB, SmaGri, USAir and School. The red line with asterisk represents the performance of the proposed DNMF.   To accurately test our proposed method, we analyze the experimental results on the six networks with different fraction of training set from 0.3 to 0.9. As shown in Figs. 2 to 4, we show the results of six networks based on AUC, Precision and PP, respectively. The results are returned with the average of over 100 runs. The six networks are Yeast, Jazz, PB, SmaGri, USAir and School. The red line with asterisk represents the performance of the proposed DNMF.
In Fig. 2, the AUC value of DNMF is consistently higher than AUC value of other methods on Yeast network, School network and SmaGri network, indicating that our method has the stable performance and can better perform when the training set is very small.
In Fig. 3, on Yeast network, PB network and USAir network, the precision value of DNMF is higher than other methods when the ratio of the training set increases from 0.7 to 0.9. This shows that DNMF can obtain a more obvious improvement than other methods. On the three evaluation indices, it can be seen that the proposed method is either the best or very close to the best, even with the size of the training set varied. Overall, it is shown that DNMF is superior to the traditional latent feature model based on non-negative matrix factorization. This suggests our proposed method for link prediction not only inherits the advantages of traditional NMF, but also takes full advantage of hierarchical latent structure information of networks by multi-layer learning. In general, it is obvious that our proposed method has better and competitive performance compared with baseline methods on the ten networks.

Parameter Analysis
In order to analyze the effect of the layer number parameter m on the proposed algorithm DNMF, we show the precision of DNMF as the parameter m varying from 1 to 4 for the six networks including Yeast, Jazz, PB, SmaGri, USAir and School. As depicted in Fig. 5, we set fraction of training set from 0.3 to 0.9 and take the widely used evaluation index Precision for link precision as evidence. It is obvious that the performances are better when m is equal to 2, so we set m = 2 in most of experiments.

CONCLUSIONS
Most of the real networks are sparse, the traditional single-layer latent features model cannot fully characterize structure organization of complex networks. In order to resolve this problem, on the basis of non-negative matrix factorization and hierarchy information of latent features, a novel algorithm called Deep Non-negative Matrix Factorization (DNMF) is proposed for link prediction. In order to verify the performance of the proposed method, three evaluation metrics including AUC, Precision and Predictive Power (PP) are used. The experimental results of 10 real networks show that the proposed method DNMF is feasible, effective and competitive.
As an extension to the nonnegative matrix factorization, our proposed method DNMF for link prediction not only inherits merit of the traditional latent feature model, but also can reconstruct network through multi-layer factorization and extract more useful and richer feature information hidden in the original network. In order to reduce the training time of link prediction, the unsupervised learning strategy of the deep autoencoder network is applied in DNMF to improve the generalization ability of the method. So the proposed method has two stages including pre-training and fine-tuning for link prediction.
There are some improved studies and limitations for proposed method in the future. How to set the parameter layer number to be adaptive automatically on different networks and how to optimize the time complexity of the algorithm still are our next work. Parallelization computation can be used to reduce the computation time.