ADAPTIVE SEMI-SUPERVISED AFFINITY PROPAGATION CLUSTERING ALGORITHM BASED ON STRUCTURAL SIMILARITY

scientific paper In view of the unsatisfying clustering effect of affinity propagation (AP) clustering algorithm when dealing with data sets of complex structures, an adaptive semi-supervised affinity propagation clustering algorithm based on structural similarity (SAAP-SS) is proposed in this paper. First, a novel structural similarity is proposed by solving a non-linear, low-rank representation problem. Then we perform affinity propagation on the basis of adjusting the similarity matrix by utilizing the known pairwise constraints. Finally, the idea of fireworks explosion is introduced into the process of the algorithm. By adaptively searching the preference space bi-directionally, the algorithm’s global and local searching abilities are balanced in order to find the optimal clustering structure. The results of the experiments with both synthetic and real data sets show performance improvements of the proposed algorithm compared with AP, FEO-SAP and K-means methods.


Introduction
Affinity Propagation (AP), proposed by Frey and Dueck, is a fast and efficient clustering algorithm. This distinctive clustering algorithm does not require the number of clusters to be predetermined like other clustering algorithms do, instead it considers all data points as potential exemplars and finds the optimal ones through continuous iteration [1]. Therefore, it is widely used in gene sequence analysis [2], text clustering [3], image processing [4,5], facility location [6] and many other fields [7÷9]. So far, a great many scholars have carried out in-depth studies of AP and put forward improved versions of it. For instance, Japanese scholars Fujiwara Y et al. eliminated the unnecessary information exchange in iteration and proposed an AP clustering algorithm that has much better convergence rate without compromising the accuracy of the clustering result [10]. American scholars Givoni I. et al. extended AP in a principled way to solve the hierarchical clustering problem and proposed Hierarchical Affinity Propagation, which was successfully applied to actual HIV genetic sequence data [11]. By introducing the idea of manifold learning to AP, Feng Xiaolei et al. proposed a manifold distance-based semi-supervised AP clustering algorithm, which more accurately reflects the potential structure of actual data and has better clustering performance [12]. Wang Xianhui et al. combined AP with K-means clustering and put forward an AP-based cluster ensemble algorithm that effectively improves the accuracy, robustness and stability of K-means clustering [13].
The notion of similarity between observations is at the root of affinity propagation clustering. However, notably in many cases, for datasets with complex structures, Euclidean distance is not sufficient to represent the underlying structures. The original AP performs well for datasets with simple structures, but not for complex ones. This is because during the attempt to minimize the decision function, the original AP algorithm tends to produce excessive local clusters. Focusing on such problems, we propose a novel adaptive semi-supervised affinity propagation clustering algorithm based on structural similarity (SAAP-SS) in this paper. In detail, this study consists of three critical steps: (1) a novel structured-based similarity is proposed to account for a low-dimensional global structure of the data. We solve a regularized low-rank representation problem of the observed data. Then we present the methods of constructing kernels for the design of a structured kernel similarity based on the low-rank representation. (2) In order to better reflect the similarity between data points, we use the known labelled data or pairwise constraints to adjust the similarity matrix. (3) During the running process of AP, a novel fireworks explosion optimization algorithm is adopted to select the preference parameter. In the early stage of the algorithm, the positions of fireworks and sparks are evaluated with the Silhouette index to search for the optimal preference space and promote the global searching ability of the algorithm. Afterward, according to the clustering structure, the radii of the fireworks explosion are adjusted adaptively to enhance the local searching ability of the algorithm and to determine the optimal clustering structure.

Affinity propagation clustering algorithm
Affinity propagation is a novel and high-efficiency algorithm that takes as input measures of similarity between pairs of data points and simultaneously considers all data points as potential exemplars. Real-valued messages are exchanged between data points until a high quality set of exemplars and corresponding clusters gradually emerges. Because of its simplicity, general applicability, and performance, we believe affinity propagation will prove to be of broad value in science and engineering [20]. Fig. 1 shows affinity propagation among a small set of two dimensional data points. Input consists of a collection of real-valued similarities between data points. Affinity propagation takes as input a collection of real-valued similarities between data points, where the similarity s(i, k) indicates how well the data point with index k is suited to be the exemplar for data point i. Each similarity is set to a negative Euclidean distance, as shown in Eq. (1): ( , ) .
A real number s(k, k) is taken as input for each data point k so that data points with larger values of s(k, k) are more likely to be chosen as exemplars. These values are referred to as preference parameter; they play important roles in determining the number of exemplars. Initially all data points are equally suitable as exemplars, the preference parameter should be set to a common value p -this value can be varied to produce different numbers of clusters. In most cases, this shared value could be the median of the input similarities.
( (:)). p median s = During the iteration, there are two types of messages exchanged between data points, and each takes into account a different kind of competition. Messages can be combined at any stage to decide which points are exemplars and, for every other point, which exemplar it belongs to. Fig. 2 shows affinity propagation is illustrated for two-dimensional data points, where negative Euclidean distance was used to measure similarity. Each point is coloured according to the current evidence that it is a cluster centre (exemplar). The darkness of the arrow directed from point i to point k corresponds to the strength of the transmitted message that point i belongs to exemplar point k [1].

Figure 2
How affinity propagation works "Responsibility" message r(i, k), sent from data point i to candidate exemplar k, reflects the accumulated evidence for how well-suited point k is to serve as the exemplar for point i, taking into account other potential exemplars for point i. "Availability" message a(i, k), sent from candidate exemplar k to data point i, reflects the accumulated evidence for how appropriate it would be for point i to choose point k as its exemplar, taking into account the support from other points that point k should be an exemplar. The message-passing procedure is simulated by the updates of the two message matrices. A decision matrix E is calculated after each update. Decision matrix E represents whether point i chooses point k as its exemplar or not.
When updating the messages, it is important that they are damped to avoid numerical oscillations that arise in some circumstances. Each message is set to λ times its value from the previous iteration plus 1 − λ times its prescribed updated value, where the range of the damping factor λ is between 0 and 1. A default damping factor of 0,5 was adopted, and each iteration of affinity propagation consisted of (i) updating all responsibilities given the availabilities, (ii) updating all availabilities given the responsibilities, and (iii) combining availabilities and responsibilities to monitor the exemplar decisions and terminate the algorithm when these decisions did not change for 10 iterations.
approximately embedded on multiple independent, low-dimensional manifolds, our goal is to discover these manifolds by using some techniques to learn low-rank representations of the data. In the cases where the observations are embedded on linear subspaces, the low-rank representation (LRR) problem can be formulated as: Where‖•‖ F is the Frobenius norm, and the solution, Z, is the minimum squared-error linear embedding on a Rdimensional subspace. Relaxing the constraint in Eq. (9), the minimization can be equivalently written as: The optimization of the rank of a matrix is nonconvex and combinatorial. However, the convex relaxation of rank, the nuclear norm, can be substituted, resulting in the convex optimization: This related problem was originally posed as a subspace segmentation approach by Liu et al. [14], who minimized the 2 1   embedding error. Naturally, a kernel low-rank representation (KLRR) formulation of the problem is proposed for the cases where data is embedded on nonlinear subspaces: where ɸ(•) is an expanded basis function with an associated kernel function.
( , ) ( ) ( ) The form and parameters of the function ɸ(•) are an assumption on the structure of the observations. Ideally, ɸ(•) is chosen such that all observations are well approximated in the expanded basis space with a linear low-dimensional approximation while still maintaining the relationship between observations. As in all kernel methods, the accuracy of the approximation of manifolds is dependent on the ability of the kernel to fit the data.

Structured kernel similarity design for clustering
Based on the low-rank representation, the methods of constructing structured kernel similarity are presented in this section. The low-rank transformation of the raw data offers possibilities for designing kernels that incorporate the underlying structure of the data.
In order to exploit this structure we consider some specific PSD kernels, which are basically the dot product, i.e.
where ω ij is the similarity between observations i and j and z i and z j are the i th and j th columns of Z, respectively. The value of ω ij is the magnitude of the cosine of the angle between the vectors z i and z j . Given the near blockdiagonal structure of the KLRR matrix, observations lying on independent subspaces have a very small similarity. One issue with this similarity function is that it is undefined if either z i or z j is identically zero. Therefore, we can define ω ij = 0 if either z i or z j is zero. With this convention, it is possible to demonstrate that this similarity satisfies the properties of PSD, i.e.
Since both K ͂ 1 and K ͂ 2 are valid PSD kernels, K ͂ is also a valid PSD kernel. The similarity proposed in (13) captures the structure of the observations, however, the scaling information is lost. In order to incorporate the structural information while preserving spatial relationships in the observation space, we propose the PSD kernel: Two observations have a large similarity only if they lie on the same manifold and have a small geometric distance. If x i and x j lie on independent manifolds, according to the structure of the KLRR matrix, the angle between the observations is small, and therefore the similarity is also small. Alternatively, if the observations lie on the same low-dimensional manifold but have a large geometric distance, the exponential term drives the similarity to a small value.
Based on the definition of similarity presented in (15), the following equation can be used for defining the distance between two observations: The distance between observations defined by (16) combines both the structural similarity and Euclidean distance of the data. If and only if two observations lie on the same low-dimensional manifold and have a small distance in the observations space, they are considered having a small distance.
According to the above definitions, a novel structural similarity between observations for AP is defined as follows:

Semi-supervised affinity propagation clustering
In AP, the similarity matrix, whose definition directly affects the performance of the clustering algorithm, is an important input reflecting the similarity between data points. In view of this, the idea of semi-supervision was introduced into this study; pairwise constraints were used to perform logical extension of the unknown data points and guide the update of the similarity matrix. There are two kinds of pairwise constraints, must-link, where the two data points must belong to the same cluster, i.e. M={(x i , x j )}, and cannot-link, where two data points should not be in the same cluster, i.e. C={(x i , x j )} [16] . The detailed rules for updating the matrix are as follows.
(1) For the data point pairs in priori information that meet the must-link constraint and the data point pairs newly accord with the must-link constraint after logical extension, perform similarity update as below.
(2) For the data point pairs in priori information that meet the cannot-link constraint, perform similarity update as below.
(3) Perform global adjustment to the unknown data points based on the principle of the shortest path according to the results of steps 1 and 2. If there is a data point connects to both data points in a data point pair pending for adjustment, and the sum of the similarities between this data point and the two data points in the pair is greater than the similarity of the data point pair, update the similarity of the data point pair to the sum. .
The updated similarity matrix would more accurately reflect the similarity between data points and the clustering result would also be improved.

Fireworks explosion optimization algorithm 3.3.1 Basic idea
Fireworks explosion optimization algorithm is an adaptive bi-directional search optimization algorithm that was designed based on the idea of fireworks explosion. The algorithm generates a certain number of firework bombs in the search space and executes the operation of explosion to each fireworks bomb; the sparks generated by the explosion then explore the neighbourhood of the original fireworks bomb. The positions of the sparks are evaluated with the validity index, and positions with higher effectiveness are considered close to the optimal solution of the objective function. Then, the optimum position is selected as the origin of the next round of fireworks explosion. Meanwhile, according to the convergence rate, the algorithm adaptively adjusts the radius of fireworks explosion to balance its global searching and local searching abilities. Eventually, through continuous iteration, the sparks of fireworks explosion will concentrate near the optimal solution to the problem. When the termination condition is met, the spark position with the highest effectiveness is the optimal solution.

Algorithm description
The value of the preference parameter p of AP clustering algorithm is in the range of (−∞, 0], and the corresponding scope of the number of clusters is [1, m], where m is the number of samples. In order to improve search efficiency and eliminate unnecessary computation, the upper limit of p was set to the median p mid of all input similarities. Meanwhile, to ensure the quality of initial fireworks, the initial search space of the fireworks explosion optimization algorithm was set as [p min , p mid ], where p min is the minimum of the input similarities. By observing the results of a large number of experiments, as shown in Tab. 1, such configuration of the parameter was proved feasible. As seen in Tab. 1, when the value of p is set to p mid , the number of clusters produced by AP is much larger than the actual number of clusters. On the other hand, when the value of p is set to p min , the number of output clusters is less than the actual number. This way, not only the search efficiency is improved, but also the quality of initial fireworks is ensured, thereby avoiding omitting clustering structures.
In a word, the value of preference parameter p has a significant influence on the clustering results. Although there is no direct relationship between them, an obvious correlation can be observed, i.e. the number of clusters increases with increasing p value and decreases with the decrease of p value. Due to the diversity of data sets, the orders of magnitude of the similarities between data points may be different, and such difference directly affects the value of the preference parameter p. Therefore, this difference in the orders of magnitude should be considered for the selection of the fireworks explosion radius, so that the validity and rationality of the explosion can be ensured. Meanwhile, based on progressive cognition of the clustering structures, the algorithm adaptively adjusts the scopes of its forward and backward searches during iteration to find the optimal clustering structure.
With the above considerations, the fireworks explosion range er was defined as follows.
[ , ] 1 where t is the number of iterations, p min and p mid are the minimum and median of the input similarities, respectively, p b is the optimal position for the last explosion, r is the radius of the initial explosion, and v is the ratio of forward search. The values of r and v are calculated by (22) This definition of the fireworks explosion range allows the range to be reasonably determined according to specific data sets, thereby improving the algorithm's search performance. In addition, by recording the numbers of convergence and divergence during the iteration, the algorithm can adaptively adjust the scopes of its forward and backward searches, and locate the optimal preference space quickly and accurately.
In order to ensure the stability of the algorithm's results, it was set that the sparks generated by fireworks explosion are evenly scattered within the range of the explosion. This setup reduces the storage requirement, accelerates the algorithm's execution, and ensures that plenty of sparks are generated.
For certain numbers of clusters, the corresponding ranges of p value may be wider. In such case, multiple iterations are required before the number of clusters changes, and these iterations are often meaningless. By enlarging the forward searching scope and reduce the backward searching radius, the problem can be solved. An acceleration factor range, as defined in (24), was introduced to reduce the computation time.
where K means the sets of candidate solutions (numbers of clusters) obtained in the t th explosion. Larger range of the candidate solutions indicates more obvious convergent tendency. With the introduction of the acceleration factor, the optimal range of the preference parameter can be located quickly, thereby saving the time spent on unnecessary computation and prevent the algorithm from stagnating during computation.
Step 2 Perform low-rank data transformation to find a low rank matrix, Z.

Step 3 Constructing structured kernel similarity s ij.
Step 4 Adjusting the similarity matrix by utilizing the known pairwise constraints.
Step 5 Fireworks exploding in the preferences space.
Step 6 Run AP algorithm and use silhouette to evaluate the position of sparks.
Step 7 Record the optimal position of sparks and its clustering result.
Step 8 Select the optimal position of sparks and return to step 4 until meeting the termination conditions.

Experimental results
In this section, we present a set of clustering experiments on many datasets, including four synthetic datasets, six UCI datasets. All experiments were performed with MATLAB 2012b on a computer with Inter (R) Pentium 2,9 GHz processer, 4 GB RAM, 500 GB hard drive, and operating system of Microsoft Windows 7 professional.

Experimental data 4.1.1 Synthetic datasets
In this part, we considered four synthetic datasets with complex non-spherical shapes clusters, as shown in Fig. 1. These datasets represent some difficult clustering instances because they contain clusters of arbitrary shape and varying densities.

UCI datasets
To verify the feasibility and efficiency of the proposed approach, we performed experiments on 6 UCI datasets, including Iris, Wine, Glass, Ecoli, Seeds and Haberman. The basic information of those UCI datasets is summarized in Tab. 2.

Validity index
In the experiments, we set the number of clusters equal to the true number of those for all the clustering algorithms. We use the following two popular validity indices to evaluate the performance for all the clustering algorithms.

Silhouette index
Assume a data set with n samples be divided into k clusters C i (i = 1, 2, …, k), a(t) is the average dissimilarity of sample t in C j to all other samples in C j , d(t, C i ) is the average dissimilarity of sample t in C j to all samples in another cluster C i , then b(t) = min{d(t, C i )}, i = 1, 2,…, k, i≠j. The formula to calculate the Silhouette index Sil of sample t is: The average Sil value of all the samples in a cluster reflects the clustering quality, where the largest average Sil value represents the best clustering quality and the optimal number of clusters. With a series of Sil values corresponding to clustering solutions under different numbers of clusters calculated, the optimal clustering solution is found with the largest Sil.

F-measure index
F-measure measures a grammar's accuracy. It considers both the precision P and the recall R of the algorithm: P is the ratio of the number of correct results to the number of all returned results, and R is the ratio of the number of correct results to the number of results that should have been returned. P, R and F-measure (F) are defined as follows.
, (30) where N is the number of data points. Larger value of the F-measure index indicates that the algorithm is more accurate.

Comparison and analysis of the results
We compared the performance of the proposed algorithm with AP, FEO-SAP (fireworks explosion optimization-based semi-supervised affinity propagation, an improved approach without using structural similarity) and K-means algorithm. The priori information, accounting for 10 % of the entire data information, was randomly generated from the datasets.
The performances of the compared clustering algorithms on four synthetic datasets are shown in Figs. 4 to 7, where the best performance for each dataset is highlighted. Pictures numbered a, b, c and d represent the clustering results of AP, FEO-SAP, SAAP-SS and Kmeans respectively.  AP performs as well as K-means for the clusters with spherical or ellipsoidal structure. However, both algorithms failed to discover non-spherical clusters, as shown clearly in Figs. 4 to7.
The performance of AP is improved by using the priori known labelled data or pairwise constraints to adjust the similarity between data points. Moreover, to a certain extent, FEO-SAP is capable of identifying the underlying clustering structure. However, the improvement is somewhat constrained.
The proposed SAAP-SS algorithm outperforms AP, FEO-SAP and K-means on all the synthetic datasets. It is able to find the underlying structure of data and recognize arbitrary clusters. In detail, the experiments on UCI datasets were performed in three steps. First, an experiment on AP, SAP (semi-supervised affinity propagation), and SAAP-SS algorithm was performed to test whether the best clustering number can be automatically identified. Then, the quality and accuracy of the proposed SAAP-SS were compared with those of AP, FEO-SAP and K-means. Finally, we performed an experiment on the six UCI datasets to analyse the relation between accuracy and the number of constraints and to compare the performance of the four methods above. The values of parameter p were set to the medians of the input similarities. As seen in Tab. 3, the numbers of clusters by the proposed SSAP-SS algorithm are in complete accordance with the real numbers in all the UCI datasets, while FEO-SAP and original AP failed to match the actual numbers. The results of F-measure index and Silhouette index show that SAAP-SS has superior clustering performance to FEO-SAP, K-means and original AP. This can be attributed to that the novel structural similarity can more accurately describe local neighbourhoods by explicitly incorporating the low-dimensional manifold structure of data. By bi-directionally search the preference space, the proposed algorithm can adaptively find the optimal clustering structure of the datasets and improve the clustering performance. Although the FEO-SAP algorithm has priori information to guide the updating of the similarity matrix, and this does improve the clustering quality and accuracy to a certain extent, it can only perform local adjustment to the similarity matrix due to the limitation of the amount of priori information. Therefore, the FEO-SAP algorithm is not able to comprehensively reflect the similarities among data points and to discover the global clustering structure of data.     The editor may be guided by the policies of the Journal's editorial board and constrained by such legal requirements as shall then be in force regarding libel, copyright infringement and plagiarism. The editor may confer with other editors or reviewers in making this decision.
Figs. 9, 10 and 11 show the clustering performance of different clustering methods with different numbers of constraints on the six UCI datasets. For all the datasets, the clustering performance of FEO-SAP, K-means and SAAP-SS algorithms gradually improve with the increase in the number of constraints. Moreover, in all cases, the overall performance of SAAP-SS algorithms is much better than those of FEO-SAP and K-means algorithms.

Robustness analysis of the SAAP-SS
According to Eq. (6) and Eq. (7), the damping factor has a significant influence on the robustness of the algorithm. So we examine the robustness of the algorithm by setting different parameters of damping factor λ. The experimental results are shown in Figs. 12 to 14. We randomly generated 200 points in the experiment where settings λ are 0,9, 0,7 and 0,5 respectively. From the results above, the greater the parameter λ is, the more robust the clustering results are. Although a slightly numerical oscillation occurs in the early iterations when λ is 0,5, soon the algorithm is tending to convergence. In conclusion, the proposed algorithm has highly robustness.

Conclusions
For the incapability of affinity propagation clustering algorithm to produce ideal clustering results when dealing with complex datasets, a novel adaptive semi-supervised affinity propagation clustering algorithm based on structural similarity (SAAP-SS) was proposed in this paper. We first solved a regularized low-rank representation problem of the observed data by deriving a computationally efficient closed-form solution that allows for handling large sets of observations. Then, we presented the methods of constructing kernels for designing a structured kernel similarity based on the lowrank representation. Moreover we used the priori known labelled data or pairwise constraints to adjust the similarity matrix in order to better reflect the similarity between data points. In addition, the proposed algorithm seeks the optimal clustering structure automatically by adjusting the forward and backward searching scope, and balancing the global and local searching abilities. The experimental results demonstrated that the clustering performance of the proposed SAAP-SS algorithm is superior to that of the original AP, FEO-SAP and K-