A Wolf Pack Optimization Theory Based Improved Density Peaks Clustering Approach

1910-1916


INTRODUCTION
The traditional Density Peaks Clustering algorithm is determined by the selection of the value, and the clustering result is limited by the selection of the cut-off distance.In 2014, two famous scientists proposed a Density Peaks Clustering algorithm (DPC) [1].This algorithm defines two kinds of information characteristics to express the data correlation [2].The first is the relative density of the data points ρ, through the parameters d c to measure the data points relative distance to other data points.The second one is the parameter δ of the data points, which is defined as the nearest distance between the data point and the highdensity data point.The DPC believes that the chance of having comparatively high-density data points is far from either side.Then the decision matrix is established according to the characteristics of the information to locate the clustering center rapidly [3].Subtle clever relationship structure design makes the algorithm quickly find the number of clusters, ignoring the cluster shape of the restrictions.And they can automatically mark the abnormal point.Density Peaks Clustering algorithm proposed in the Journal of Science has been well acknowledged in various fields recently.
Text clustering is one of the core problems in the field of text excavation and information search.The data similarity in class clusters is high and the data similarity between class clusters is small.The clustering algorithm can be divided into four categories: partitioned clustering algorithm, hierarchical clustering algorithm, clustering algorithm based on based and intelligent clustering algorithm.This paper mainly discusses the Density Peaks Clustering algorithm that was published by Alex Rodriguez and Alessandro Laio in Science in June 2014 [4].Although the article has been questioned by many readers since it came out, the basic thinking of the new clustering algorithm is novel, simple and bright, which is worth learning.The core concept of this new clustering algorithm is to explain the clustering center.This paper will introduce the principle of this algorithm in detail and discuss some details.
Many years ago, people used particle swarm optimization (PSO) and ant colony optimization (ACO) for the use of ant colony, such as ants and birds, clustering intelligence algorithm, and successfully used to solve a large number of practical problems [6][7][8][9][10].The Wolf Pack Algorithm adopts bottom-up design method based on artificial wolves subject and collaborative search path structure based on responsibility division [5].And individual behavior decision-making is based on their responsibilities [11].Wolf detection: when searching for prey, wolves do not go out in full force, but send a few elite wolves to hunt within the possible range of the prey.The wolves around them would spontaneously run in the direction of the wolves and move closer to the prey [12,13].
WPA is a new swarm intelligence algorithm put forward in 2013.This algorithm can be put into use to solve 0 -1 knapsack problem, multidimensional knapsack problem, and optimal operation of hydropower station reservoir [14].In this paper, Wolf Pack Algorithm is used to optimize the cut-off distance of Density Peaks Clustering algorithm, and the optimal results are found by adaptive algorithm [15].This paper introduces the WPA to optimize the Density Peaks Clustering algorithm, so as to get better clustering results, improve clustering performance and make clustering more robust.The new method, which combines the clustering algorithm and swarm intelligence optimization algorithm, can process the financial data such as stocks and improve the clustering accuracy, which provides reliable evidence for fusion arithmetic.In part III, we introduced the Density Peaks Clustering algorithm based on the Wolf Pack Algorithm.In part IV, we made the experiment simulation and confirmed the advantages of novel algorithm.

DENSITY PEAKS CLUSTERING ALGORITHM
The DPC has been well acknowledged in various fields.In spite of this, DPC algorithm still has shortcomings.It is impossible to effectively process data points located in the low density region of the dataset, and mistakenly classify outliers and intermediate nodes into cluster classes.The artificial selection of cluster class center reduces the objectivity and accuracy of the algorithm to obtain the real cluster class.It cannot effectively process complex structure data, and performs poorly in processing complex data such as complex flow pattern, differentiation density and difference scale [16].
There are two significant regulations in the DPC algorithm.Cluster center point has a higher density than the other adjacent points.The cluster center point defined by the cluster has a higher density value point relatively far away [17,18].The original DPC algorithm defines the the distance between the data points by euclidean distance.Supposing that there are data points x i , x j , then the distance formula is as follows [19,20].
The algorithm process details are as follows: the clustering center is determined by the feature [21]; the other nodes are categorized into the nearest cluster with a higher density than themselves.DPC utilizes the distance between the cluster density node i and local-density of node j greater than i, and its definition is shown in Eq. ( 2) and (3).
where d ij is the distance about node i and j, d c is the cut-off length.Χ(x) is as follows: The length with the highest local density δ i is shown in Eq. ( 5): The parameter δ i of the data node x i which has the max local-density is named as the max distance between data point x i and the others in the dataset.Otherwise, δ i is defined as the nearest distance between the data node and the higher local-density node [22].
As shown in Eq. ( 2), the local-density ρ i of the data nodes is influenced by the parameter d c .Also, the algorithm shows that the experiment results of the DPC algorithm are less affected by d c when the number of datasets is large.Otherwise, it would be severely affected by the d c .The choice of the d c affects the local density and clustering results of the data points.When the dataset sample is small, the DPC algorithm uses a Gaussian kernel function to calculate the sample density.The specific expression is shown in Eq. ( 5): The local-density ρ i and distance δ i values of each data point are output according to the above formula, and then the corresponding decision graph can be generated to determine and select the clustering center point.The clustering centroids often have large local density ρ i and distance δ i values at the same time, and based on this feature, the clustering centroids can be circled in the decision graph [23][24][25].Step 1 Calculate the cut-off distance d c ; According to the Eq. ( 1), calculate the distance between the data points d ij R N×M; Sort the data points in ascending order; Select the appropriate d c according to the sorted distance matrix; Step 2 Select the cluster center; Calculate ρ i according to Eq. ( 2) and Eq. ( 3); Calculate δ i according to Eq. ( 4); Sort the density of all alternative points in descending order; An alternative point with a high ρ value and a relatively high delta value is marked as the cluster center point; Step 3 Assign the non clustered center point to the corresponding cluster center; According to the relation of δ i value, the noncluster center point is sequentially attached to its higher density point; Detecting presence of Halo points in the cluster.

DENSITY CLUSTERING ALGORITHM BASED ON WOLF PACK ALGORITHM 3.1 Wolf Pack Algorithm
Harsh living conditions and hundreds of years of evolution have created wolves with sophisticated organizational systems and sophisticated cooperative hunting methods.Wolves live in groups and have a clear social division of labor.They are responsible for the survival and development of wolves.In the initial solution object, the artificial wolf including the optimum value of the initial object function as the first-wolf.During iteration, the target function value of the optimum population after each iteration was compared with that of the previous generation of wolves.If the situation improves, it will update the location of opposing wolves.Instead of performing the three intelligent behaviors, the wolf simply proceeds to the other stage until it is changed by a more powerful artificial wolf.
At the first p (p = 1, 2, ..., h) the location of exploratory wolves I in the d-dimensional situation after advancing in all directions.
During the attack, suppose that the wolf perceives the prey odor concentration J is x j > X lead , and X lead = X j , the wolf transforms into a better wolf and initiates the evocation mechanism.Supposing X j < X lead , and the wolf J continues to attack till the length between it and the wolves is less than d near , that is to turn to the siege.Supposing the variable d value to optimal is (minimum d, maximum d), then the determination distance d near can be estimated by Eq. ( 8).
The rushing wolves were near the prey, and the wolf had to join the scouts in a close siege to capture it.The location of the wolves is nearest the prey, leader wolf, seen here as the location of the prey.Particularly, about the kgeneration wolves, if the site of prey in the d-dimensional situation is set, then the siege action of wolves can be described by Eq. ( 9).
Supposing the data range of the variable d to be optimum (minimum d, maximum d), and step a , step b , step c and attack step in d-dimensional circumstances will be involved in three kinds of intelligent behaviors.
Taking the maximum value of the solution function as an example, the process of solving Wolf Pack Algorithm is briefly described.
Step1: Spatial coordinates of wolves are randomly initialized in the solution space, and the artificial wolves are selected according to the size of the target function value.
Step 2: The wolves scout start to wander randomly to search for prey.If it finds that the target function value of a certain position is better than the object function value of the wolves, the position of the wolves will be updated.And the wolves will issue summoning behavior at the same time.If it is not found, the scout will continue to travel until it reaches its maximum number of rovers.
Step 3: The wolves that heard the call of the head wolf rushed to the head wolves with large step length.If the object function answer of the wolves were better than the object function answer of the head wolves during the running attack then the position of the opposite wolves would be updated.Otherwise, the wolves will continue to attack until they reach the siege area.
Step 4: Approaching the wolves will cooperate with the wolves scouts to round up the prey (the position of the wolves is regarded as the prey).If the target function value of other artificial wolves is better than that of the wolves, the position of the wolves will be updated until the prey is captured.
Step 5: The artificial wolves with small target function value in wolves were eliminated.And the new artificial wolves were randomly generated in the solution space to realize the renewal of wolves.
Step 6: Finally, determine whether wolf target function answer can meet the algorithm requirements, or whether all iterations in the process have been completed.If it does not reach the maximum number of iterations, then continue the iteration from the wolves detection walk until it reaches the maximum number of iterations.Otherwise, the output is the spatial coordinate and value of the head wolf objective function, which is the optimal solution of the function.

Density Peaks Clustering Algorithm Based on Wolf Pack Algorithm
By using the optimization effect of Wolf Pack Algorithm, the selection of cut-off distance d c of Density Peaks Clustering algorithm is optimized.Density Peaks Clustering algorithm clustering results are limited to the selection of cut-off distance.Wolf Pack Algorithm can be an index selection, optimization, to get the best value, the two algorithms to optimize, we can get better clustering results.In the solution space, the spatial coordinates of wolves are randomly initialized, and the artificial wolves are selected according to the value of the objective function; Step 2 If the target function value of a certain position is found to be greater than that of the target function value of the wolf, the position of the wolf will be renewed .If it is not found, the wolf scout will continue to travel until it reaches its maximum number of rovers; Step 3 Run DPC algorithm and use internal index SIL to evaluate the effectiveness of each Wolf pack; Step 4 Record the most effective Wolf position, iterate and output the clustering results, otherwise return to Step 2.

SIMULATION EXPERIMENT AND ANALYSIS 4.1 Evaluation Index
The experimental environment simulation was Inter (R) Pentium 2.9 GHz, memory for the 4.00 GB, hard drive 500 G, the operating system for the Window 7, programming language for the MATLAB 2019a.
For the good clustering results obtained in the experimental analysis and the known clustering number the clustering algorithm results were represented by N and compared with the previously known clustering algorithm results M. This way is named the external evaluation index analysis.We have the four relationships between m and n.This is a clustering result of a known structure advance evaluation method based on M.
(1) m and n pertain to the category of P, and also to the same partition in M.
(2) m and n are part of the category of P, but different from segmentation in M.
(3) m and n do not pertain to the different sample in P, but belong to the same segmentation in M.
(4) m and n are not part of the class in P, and are not part of the same segmentation in M.

Silhouette Indicators
Suppose there are N cluster classes in a dataset D containing n clusters: C j (i = 1, 2, …, n).a(t) represents the mean dissimilarity.d(t, C j ) represents the mean dissimilarity or distance between all data points of C j to another cluster C j .b(t) = min{d(t, C j )}, where j = 1, 2, …, n and i ≠ j.The silhouette index calculation for sample t is shown in Eq. (11).
The size of the Sil(t) value reflects how good the clustering results are.It usually takes a value between −1 and 1, and the closer the value is to 1, the better the clustering result is.

F-measure Indicators
The F-measure indicator is an external indicator that unites the recall R(i, j) and the accuracy rate P(i, j).Supposing the true clustering P j and C i , the accuracy P(i, j) and recall rate R(i, j) are shown below: precision( , ) F-measure index formula as shown in Eq. ( 14): 2 ( , ) ( , ) ( , ) ( , ) ( , ) The magnitude of the F-measure value reflects the precision of the clustering result, and it takes a value between −1 and 1.The closer the value is to 1, the higher the clustering precision is.
( 1) In the simulation experiment, in order to verify the effectiveness of the experiment and the feasibility of WPA-DPC, after searching the data, this paper selects the Iris, Flame, Spiral, Seeds and Aggregation datasets of UCI dataset.The datasets used in this paper are shown in Tab.3:

Figure 1 Comparison of Silhouette indicators
The WPA-DPC algorithm and DPC two algorithms are contrasted, mainly from two aspects to verify the advantages and disadvantages of the WPA-DPC algorithm so that we can determine the effectiveness with two kinds of clustering algorithm.The five datasets in UCI, DPC and WPA-DPC respectively.Then validate two algorithms accuracy rate (FM) and evaluate the clustering results index (Sil).We utilize the effectiveness of these indicators to analyze the experimental results.The experimental results are shown in Fig. 1 and Fig. 2.

Cluster Performance and Experimental Results Analysis
It can be indicated from the FM index that the performance of the WPA-DPC algorithm and the traditional DPC algorithm in the five test datasets are generally equally those of the DPC algorithm.We can obviously see that the accuracy rate is improved.It can be seen from the silhouette index that the WPA value of the WPA-DPC algorithm in the dataset Iris, Aggregation Wine and Flame is equal to the traditional DPC algorithm.In the datasets Iris and Flame test comparison, the Sil index of the improved algorithm and the original DPC algorithm sil indicators are roughly the same.When the dataset is spherical data, the sil index is as high as the better.When the similar in the spiral structure of these complex structure.The Sil index and the quality of the cluster is a negative correlation between, so we can know WPA-DPC clustering better.In general, WPA-DPC's silhouette index is slightly higher than the DPC algorithm, the algorithm clustering results are better.In conclusion, the Wolf Pack Algorithm based on Density Peaks Clustering can effectively establish the similarity relation between data.It can find the optimal data value, the clustering situation is more evident, and the clustering result is more accurate.From the figure, it can be concluded that the DPC cluster is divided into three categories and the distribution is scattered.And the distribution of WPA-DPC is obvious and concentrated.WPA-DPC makes the clustering result more impressive.
The Spiral dataset is spherical data, the better the sil index, the better the experiment results.The quality of clustering is negatively correlated, which indicates that the clustering results are better.The Wolf Pack Algorithm based on the Density Peaks Clustering can effectively establish the similarity connections between data, and find the optimal dataset value.The clustering trend is more obvious, and the clustering results are more accurate.The silhouette index of WPA-DPC is slightly equal to DPC algorithm, and the experimental results of this algorithm are better than those of previous ones.As shown in Fig. 8, it is the clustering result graph of aggregation of dataset, whose real class number is seven.It can be inferred from the clustering effects in the figure above that the improved algorithm can get the real number of classes, the number of clusters is accurate and the clustering efficiency is good.
It is very straightforward to see that the WPA-DPC algorithm is better than DPC algorithm.Therefore, the clustering algorithm based on coda-bird optimization algorithm (WPA-DPC) has a good propelling effect.To verify the universal applicability of WPA-DPC, we utilize high dimensional dataset to verify the algorithm.Based on the simulation results, we can see that the method is also robust and effective to high-dimensional dataset.
The above three comparison diagrams show that the WPA-DPC algorithm is superior to the original DPC algorithm.The improved algorithm can automatically search for the optimal cut-off distance and achieve the best clustering result.

CONCLUSION
We proposed a Density Peaks Clustering algorithm (WPA-DPC) based on Wolf Pack Algorithm.The algorithm is with the idea of the optimization of the Wolf Pack Algorithm.The similarity matrix is established and the parameter value d c of the DPC algorithm is updated.The results indicate that the clustering number of the improved algorithm is closer to the real number of clustering.The method performance and clustering evaluation index value are significantly improved.
At the same time, WPA-DPC algorithm can solve the problem of manually selecting parameters of DPC algorithm.However, the clustering effect is poor when dealing with complex datasets.Therefore, how to establish a link between different datasets and parameters is a problem that needs to be addressed.On the other hand, we optimize the different parameters, and then integrate all the optimization results to make the Density Peaks Clustering algorithm have better clustering effect.Whether these ideas can improve the quality of clustering remains to be studied.Our next step is to improve the DPC algorithm to adapt to the complex structure of the dataset.

Figure 2
Figure 2 Comparison of F-measure indicators

Figure 4 Figure 5 Figure 6 Figure 7 Figure 8
Figure 4 Flame dataset clustering result graph of WPA-DPC

Figure 9 Figure 10Aggregation
Figure 9 Aggregationdataset clustering result of DPC

Figure 11 Figure 12
Figure 11 Seeds dataset clustering result graph of DPC

Table 1
The DPC algorithm Input Distance matrix XR N×M , stage distance d c Output Class label: yR N×M

Table 2
Density Peaks Clustering based on Wolf Pack Algorithm Input: Intercept distance d c value, maximum iteration t, population size of wolves, location of wolves Output: Cluster number evaluation index Sil, Fm values

Table 4
Comparison of Evaluation Indexes of Clustering performance