LINK PREDICTION USING DISCRETE-TIME QUANTUM WALK

Original scientific paper Link prediction is one of the key issues of complex networks, which attracts much research interest currently. Many link prediction methods have been proposed so far. The classical random walk as an effective tool has been widely used to study the link prediction problems. Quantum walk is the quantum analogue of classical random walk. Numerous research results show that quantum algorithms using quantum walk outperform their classical counterparts in many applications, such as graph matching and searching. But there have been few studies of the link prediction based on quantum walk, especially on discrete-time quantum walk. This paper proposes a new link prediction method based on discrete-time quantum walk. Experiment results show that prediction accuracy of our method is better than the typical methods. The time complexity of our method running on classical computers, compared with the methods based on classical random walk, is slightly higher. But our method can be greatly accelerated by executing on quantum computers.


Introduction
Link prediction is one of the fundamental issues of complex networks. The goal of link prediction is to predict the status of links for unobserved pairs of nodes based on a partially observed network; or to predict the network state at the next time step based on a sequence of fully observed network at various time steps [1]. Link prediction is useful in broad application fields [2,3], such as recommender systems, protein-protein interaction networks, scientists' collaborations network, the detection of unknown links in terrorism networks, the prediction of friendship formations and the prediction of web hyperlinks. A large number of methods have been proposed to predict links in numerous fields [4]. A good summary is given by Lvand Zhou [1]. The simplest framework of link prediction methods is the similaritybased algorithm. It defines a measure of proximity or similarity between two nodes in the network based on the observation that the more similar nodes have higher probability to be connected. The widely applied methods to measure similarities between nodes are based on random walk on the network. In Average Commute Time (ACT) [5], the similarity between two nodes is defined as the reciprocal of the average number of steps required by a random walker starting from one node to another node. Random Walk with Restart (RWR) [5] is a direct application of PageRank [6]. It considers a random walker starting from one node will iteratively move to a random neighbour with probability c and return to the node with probability 1 c − . SimRank [7] measures how soon two random walkers, respectively starting from one node, are expected to meet at a certain node. Although these prediction methods provide good accuracy, the time complexities are too high to be practicable for large-scale networks. Liuand Lv [8] puts forward two novel indices, Local Random walk (LRW) and Suspended Random Walk (SRW). It is shown that the prediction accuracy is improved and the computational complexity is reduced, especially for the sparse network with lower clustering Coefficient [8].
Quantum walk, which was first proposed by Aharonov et al. [9], is the quantum analogue of classical random walk. In both, a particle (or walker) in a network moves through the network with time elapsing. In classical random walk, the motion of a particle is diffusive. While in quantum walk, it allows for quantum superposition of the particle states and explores the interference of the terms in these superpositions; and the motion of the particle is more akin to wave propagation [10]. Such differences have motivated a search for quantum algorithms based on quantum walk. Numerous research results show that quantum algorithms outperform their classical counterparts in many applications such as search, element distinctness and combinatorial optimization. For example, Sánchez-Burillo [11] defines a quantum navigation method providing a unique ranking of the elements of a network and solves degeneracy found in the ranking of PageRank. But few of the research works are about the link prediction based on quantum walk.
In this paper, two new node similarity indices based on quantum walk are defined. Like classical random walk, there are two basic types of quantum walk, continuoustime quantum walk (CTQW) [12] and discrete-time quantum walk (DTQW) [9]. Here we consider a particular version of the discrete-time quantum walk known as the scattering quantum walk (SQW) [13]. Experiment results show that prediction accuracy of our method is better than of the typical methods. The time complexity of our method running on classical computers, compared with the methods based on classical random walk, is slightly higher. But our method can be greatly accelerated by executing on quantum computers. SQW is analogous to the propagation of light through an interferometer. In this type of quantum walk, the particle resides on the directed edges and scatters at the nodes at each time step. Each edge ( , ) ( ) u v E G ∈ has two orthogonal states, uv , corresponding to the particle being on the edge ( , ) u v and going from node u to node v , and vu , corresponding to the particle being on the edge ( , ) v u and going from node v to node u. Set where C uv α ∈ is the quantum amplitude of the state uv .
The probability that the walk is in the state uv is given Unlike classic random walk, the evolution of SQW is governed by unitary matrix where U is the quantum amplitude matrix which determines the one-step transition probabilities for transitions between different states. U is a unitary matrix, i.e. . ) ( The sum of the squares of the amplitudes for all the transitions from a particular state must be unity. Suppose we are in the state uv ψ = . If the particle is transmitted, it will be in the adjacent states vx ( ) and if reflected in the state vu . Let the transmitted amplitude be t, and the reflection amplitude be r. Then the transition will be of the form Here we use Grover diffusion matrices [14], which assigns quantum amplitudes of The Grover diffusion matrix Obviously, a particle can make a transition from one directed edge uv to another directed edge wx if and only if v w = and v x ≠ . Given a quantum walker starting from node u , we consider that there is an equal prior probability to find the quantum walker on any directed edge uv ( ) at the start of evolution (at 0 n = ). This leads to an initial state of the form After n step, the evolution of the SQW is given by In general ux p does not converge either. The reason being that U, as a unitary transformation, prevents the walk from reaching a steady state.
We modify ( ) p n to characterize more efficiently the similarity between node u and x: ux xu u x ux xu p n p n q p n q p n s n p n p n q q p n p n . It is obvious that ux xu s s = .
One difficulty with LSQW index, which is the same as all random walk based indices, is its sensitive dependence on parts of the network far from u and x , even when u and x are connected by very short paths [1]. Away of counteracting this dependence is to continuously release the walkers at the starting point independently; in this way, the target node and the nodes nearby lead to higher prediction accuracy. Similarly to [8], the corresponding index, superposed SQW index ( )

Experiment
In this section, we first describe our experiments on real data sets, and then present the performance of several baselines and our link prediction methods.

Data sets
We import six typical real-world complex networks collected from different fields: (i) Lesmis: Lesmis is the coappearance network of characters in the novel Les Miserables; (ii) NS (netscience): A network of coauthorships between scientists who are themselves publishing on the topic of network science; (iii) C.elegans: The neural network of the nematode worm C.elegans, in which an edge joins two neurons if they are connected by either a synapse or a gap junction; (iv) jazz: a collaboration network of jazz musicians; (v) USAir: The network of the US air transportation system. The weight of a link is the frequency of flights between two airports; (vi) email: the e-mail network of University Rovira i Virgili (URV) in Tarragona, Spain. The detailed statistics descriptions of these data sets are listed in Tab. 1, where N and M are the total numbers of nodes and links, respectively; k is the average degree of the network; d is the average shortest distance between node pairs; C is the clustering coefficient.

Evaluation metrics
To quantify the prediction accuracy of each method, we adopt the approach used in [15]. In ( , )  [15] to quantify the accuracy of prediction measures. The notion of AUC is the probability that a randomly chosen missing link (a link in test E ) is given a higher score than a randomly chosen nonexistent link (a link not in E). In k times independent comparisons, if there are ' k times the missing link having a higher score and '' k times they are of the same score. The . When all the scores are generated from an independent and identical distribution, the AUC should be about 0.5. The degree to which the AUC exceeds 0.5 shows how much better the algorithm performs than pure chance.

Baseline predictors
For the sake of comparison, the link prediction results on five baseline predictors are presented. Two indices based on classical random walks, Local Random Walk and Superpose Random Walk index, are considered. In addition, in [16] the Common Neighbours index, the Adamic-Adar index [17] and Resource Allocation index [18] had better performance in most cases. Here the three methods are also considered as the basic predictors. A brief introduction of these indices is given as follows.
(1) Common Neighbours (CN): where Γ( ) x is the set of neighbours of x in G. The CN predictor is according to the observation that two nodes are more similar with more common neighbours.
(2) Adamic-Adar (AA): where ( ) k z is the degree of the node z. The AA predictor is according to the observation that the common neighbor which has a small degree contributes more similarity than the one that has a large degree. This index assigns rarer connected node more weights.
(3) Resource Allocation Index (RA): This index is motivated by the resource allocation dynamics on complex networks.
(4) Local Random Walk (LRW): This metric is based on the classical random walk. The transition probability matrix is P, with 1/ ( ) The xy P presents the probability that a random walker staying at node x will walk to y in the next step. Suppose a random walk starting at x, the probability that it locates at y after n steps is ( ) xy n In order to counteract the high clustering or locality of networks which may lead to nodes' sensitive dependence on parts of the network far away from target nodes, Superposed Random Walk Index superposes the contribution of independently moved walkers. The SRW similarity index is defined as

Methods comparison
In our first experiment, we perform the seven methods on six real-world networks with various fractions of edges in training set, p. The results are shown in Fig. 1.
It can be seen that while the p increases, all these similarity-based methods have better prediction performance. In network with low density (for example p.≤.0.5) the methods based on random walk and quantum walk perform better than the other three typical methods. And SSQW seem to be the best method on most networks while p grows to a certain value, except for on NS and email. But the optimal results of NS and email are very close to the corresponding AUC curves of SSQW. Moreover, it is also observed that similar results as [8], LSQW and SSQW are not sensitive to the density of the network.
As SRW (or LRW), our methods also have an important parameter n (the number of steps). We now discuss how to set this parameter and how the parameter setting affects the link prediction results. Fig. 2 shows the correlation between the number of steps and the value of AUC. For all data sets, when the number of step is small, e.g. 4 n ≤ , the link prediction performance of the link prediction performance of SSQW (or LSQW) and SRW (or LRW) is close. As n increases, SSQW (or LSQW) performs better than SRW (or LRW). The two exceptions are the prediction results of LSQW and LRW on NS and the prediction results of SSQW and SRW on email, but the two curves in each pair of curves are rather close to each other. Meanwhile, we observe that the link prediction accuracy of SSQW (or LSQW) decreases more slowly with the increase of steps than that of SRW (or LRW). Especially, the AUC value of SSQW becomes stable after 6 n ≥ . The results indicate that SSQW is not sensitive to the number of the steps, while SRW is sensitive to the number of the steps. It is because of the time-reversal invariance of a quantum walk [13]. Although [8] mentions the positive correlation between the optimal step and the average shortest distance in SRW, it should still be careful to select the value of the step. In SSQW, we set the value of n to be 7.  Although our methods use the aspects of quantum mechanics such as interference and superposition, they can be readily implemented on classical computer. Since the number of states for the SQW is 2M , the complexity of our methods is ( ) n O M k . As we known, the time complexity of n-step SRW (or LRW) is approximately ( ) n O N k . In terms of raw computations required, our method is slightly more expensive than SRW (or LRW) on classical computer. However, the simulation of the discrete-time quantum walk achieves quadratic speed up via Grover's technique [14] and this means, that the simulation of the quantum walk could be realized with complexity ( ) n O M k on quantum computer. It is obvious that the time complexity of SSQW (or LSQW) on quantum computer is much less than that of SRW (or LRW) on classical computer.

Conclusions
In this Letter, a quantum perspective was applied to the solution of link prediction problem, trying to investigate whether quantum walk may improve the prediction accuracy, compared with classical random walk. We first proposed two similarity indices based on SQW: the LSQW and the SSQW index. Our methods were compared with five well-known methods on six real networks. The results show that SSQW can give overall better predictions than the other methods on AUC scores. Otherwise, the time complexity of SSQW (or SLQW) on classical computers is a little higher than that of SRW (or LQW). But SSQW (or SLQW) gives rise to a quadratic speedup on quantum computer.
Our work can be further improved in several aspects. First, we will study in depth how our methods perform on networks with different structural properties. Second, CTQW's behaviour can consequently be markedly different to that of the DTQW. We plan to study the link prediction on continuous-time quantum walk in future work.