Parallel Acceleration and Improvement of Gravitational Field Optimization Algorithm

The Gravitational Field Algorithm, a modern optimization algorithm, mainly simulates celestial mechanics and is derived from the Solar Nebular Disk Model (SNDM). It simulates the process of planetary formation to search for the optimal solution. Although this optimization algorithm has more advantages than other optimization algorithms in multi-peak optimization problems, it still has the shortcoming of long computation time when dealing with large-scale datasets or solving complex problems. Therefore, it is necessary to improve the efficiency of the Gravitational Field Algorithm (GFA). In this paper, an optimization method based on multi-population parallel is proposed to accelerate the Gravitational Field Algorithm. With the help of the parallel mechanism in MATLAB, the algorithm execution speed will be improved by using the parallel computing mode of multi-core CPU. In addition, this paper also improves the absorption operation strategy. By comparing the experimental results of eight classical unconstrained optimization problems, it is shown that the computational efficiency of this method is improved compared with the original Gravitational Field Algorithm, and the algorithm accuracy has also been slightly improved.


INTRODUCTION
Nowadays, optimization problems are quite common in many areas of the real world. The applications of optimization theories and methods have greatly promoted the development of agriculture, transportation, engineering, biotechnology and other research fields [1]. The modern optimization algorithms are widely applied, such as logistics scheduling problem [2], production scheduling optimization problem [3] and the predictive gene control network in bioinformation [4].
Modern optimization algorithms [5] emerged as the times require, which are heuristic algorithms [6][7] rising in the early 1980s. They are based on objective natural phenomena and can solve practical optimization problems by imitating the phenomena or behaviour of nature. For example, Genetic Algorithm (GA) was proposed by John Henry Holland in 1975 [8], which simulated the selection process of natural evolution. Dorigo proposed an ant colony optimization algorithm in 1992 based on the behaviour of ants discovering food and communicating with the members in colony [9]. The Particle Swarm Optimization (PSO) algorithm was proposed by Eberhart and Kennedy in 1995 [10], which simulated a simplified social pattern and behaviour of birds. These optimization algorithms are constantly being improved to cope with a wide variety of applications. However, they also all have their own inherent defects. For example, GA is limited by local optimal solution and genetic bleaching [11], SA algorithm usually has much long running time [12] and PSO algorithm has the problem of low accuracy and is easy to fall into local optimum. To deal with these shortcomings, the Gravitational Field Algorithm as a new simple and effective heuristic search algorithm was proposed by Zheng in 2012 [13]. Compared with other optimization algorithms [13][14][15], GFA can not only solve the multi-peak problems efficiently, but also converge to global optimum solution soon.
Although GFA is a simple and effective algorithm, it still has a long running time when initial dust population or scale of given problems is large. In order to improve the efficiency of GFA, the parallelization of this algorithm is studied.
Based on the original GFA, Parallel Gravitational Field Algorithm (PGFA) is proposed in this paper. PGFA adopts the strategy that divides initial dust population into multiple subgroups. And these multiple subgroups are allocated to multi-core CPU processors for parallel computation, based on the parallel mechanism in MATLAB. The subgroups in each processor are dealt with independently, and the results in each processor will be summarized to obtain the final optimal solution when all tasks in processors have been completed. Moreover, this paper improves the absorption operation, based on the original GFA. This improvement of absorption operation optimizes the search space for iteration, which can avoid excessive redundant search operations. And it also helps accelerate the iteration process in algorithm.
To verify the performance of PGFA, we test PGFA and original GFA on several common benchmark functions to compare the efficiency and accuracy. The experimental results on eight complex unconstrained problems indicate that PGFA has better accuracy and efficiency than original GFA.

ORIGINAL GRAVITATIONAL FIELDALGORITHM
The original Gravitation Field Algorithm is a heuristic search algorithm proposed by Zheng et al. [8]. The idea of this optimization algorithm is derived from the theory of planetary formation which is accepted by most astronomers: the Solar Nebular Disk Model (SNDM) [16].
The main idea of SNDM model is that dust and nebulae originally drift in the universe, and the gravitational field unites them together, eventually forming the planets in solar system. The GFA simulates these natural phenomena to solve optimization problems. All individuals, regarded as the dust population in GFA, represent all the feasible solutions. The mass of dust is calculated according to the objective function, and it will serve as a criterion for evaluating the quality of dust. Each dust attracts each other by gravitational field and eventually gathers at one point. That final point's location represents the optimal solution.
In the original GFA [17], the dusts are initialized randomly, and the flow chart of GFA is shown in Fig. 1. The details of GFA are summarized as follows. represents the feasible solution of a certain problem. d i can be a scalar or vector. For the j-th dimension space of the i-th dust particle (feasible solution) d i , the feasible range is set as jbegin jend X X ,     .  Second, the whole solution space is divided into several subspaces randomly, every subspace is called "group". The dusts with the largest mass value in each group are called center dusts, and the other dusts are called surrounding dusts.  Third, moving dusts. In GFA, the movement is unidirectional, that is, center dust will not move, and surrounding environmental dust will move toward center dust under the gravity of center dust. The pace of movement is determined by Eq. (1).
where dis i represents the Euclidean distance between the surrounding dust i and its center [13] dust. And w represents the pace of moving distance [17]. The w = 0.0618 in [13].  Fourth, absorbing dusts. When the distance between surrounding dusts and their center is less than the threshold set, surrounding dust will be absorbed by center dust. The absorption strategy adopted in [19] is to delete these surrounding dusts directly.  Fifth, the rotation operation. The aim of Rotation Operator in [14] is to push surrounding dust away from center dust. The direction of departure is not original forward direction, but the random distances in every dimension. To ensure the surrounding dusts will not be pushed too far away, it needs to set a maximum distance that can be pushed out. That is defined by Eq. (2) in [17].
 Sixth, checking whether the termination conditions are met. If one of the algorithm termination conditions is satisfied, the algorithm will be terminated. Otherwise, go to the third step and continue to search.
 According to the detailed description above, the pseudocode of GFA is given in Algorithm 1. It is reported that original GFA is very effective for solving the approximate optimal solution of unconstrained optimization problems [13,17]. However, regardless of efficiency or accuracy, it may be difficult to obtain a satisfactory solution for the high-dimensional optimization problems [17]. So it is significant and essential to develop and study the parallel model of GFA.

PARALLEL GFA BASED ON PARALLEL MECHANISM IN MATLAB 3.1 Study on the Parallelism of GFA
The parallelism of evolutionary algorithms, based on population model, such as GA and SA, are mainly due to the independence of individual behaviour. Two ideas are mostly adopted in parallel implementation of these evolutionary algorithms. One decomposes the whole algorithm into tasks and executes multiple tasks in parallel. For example, reference [18] proposed a parallel scheme that applied this idea for SA. The other one, such as reference [19] uses the divide-and-conquer model to divide the whole population into several sub-groups and optimize them separately.
GFA is similar to other evolutionary algorithms, such as PSO and GA, as the fact that they are all based on the concept of colony, so GFA can also be paralleled [20]. According to the characteristic of GFA, there are two aspects that have the possibility of being paralleled [21]: a) The calculation and update of position and mass for individuals in GFA can be paralleled since the behaviours of all dusts are independent, except the center. b) The process of obtaining and updating the center dusts also can be paralleled. After calculating the position and mass of an individual, the individual will immediately compare with the current center dust mass and update center dust.
GFA has the operation of decomposition solution space, and each iteration of the dust particles is independent. Since the characteristic of GFA do coincide with above parallel pattern, this paper adopts divide-andconquer [22] method to realize the Parallel Gravitational Field Algorithm (PGFA). This paper divides the whole dust population into several subgroups, and searches the optimal solution independently in parallel. Because of the above process, the search time of PGFA will be reduced greatly. In this paper, PGFA is designed and implemented by the parallel mechanism of MATLAB.

Introduction of Parallel Computing Platform in MATLAB
As a powerful enterprise-level mathematical software, MATLAB provides us with an efficient and convenient parallel computing architecture. It mainly relies on two tools: Parallel Computing Toolbox (PCT) and Matlab Distributed Computing Server (MDCS) [23]. The two tools support parallel computing of multiple cluster nodes. Based on the two parallel tools, we can concentrate on the parallel design of GFA and ignore some details in parallel implementation, to improve the algorithm efficiency and avoid some unnecessary work.
The MATLAB parallel computing platform is composed of cluster nodes and networks from the view of physical distribution. And it is composed of workers, scheduler and Matlab client in view of the logic. Fig. 2 shows a typical architecture of parallel computing in MATLAB. The main function of scheduler (job manager) aims to manage Matlab workers and compute resources for Matlab job. Workers are the basic logical units to perform parallel computing tasks in MATLAB, which correspond to physical units, responding to the assigned tasks and returning results [24]. This parallel architecture can be not only deployed on one computer, but also deployed on multiple computers by network. In this paper, we use the parallel mechanism in MATLAB to realize PGFA in view of the following points: (1) The platform is established easily. Parallel tools can be installed in one-click without complex environment configuration.
(2) The parallel mode is flexible and diverse. MATLAB provides us with a variety of built-in parallel structures. These parallel structures are provided through system built-in or system functions [24].
(3) The resources of hardware environment are accessible. It is convenient and accessible to use parallel tools with limited hardware resources.
Compared with other parallel platforms, such as CUDA [25] and FPGA [26], parallel platform used in this paper may not have the best performance, but it is quick and easy to design and implement PGFA.

PGFA Design Based on Parallel Architecture in MATLAB
Based on the analysis of GFA and parallelism architecture in MATLAB, this paper adopts the masterslave model to design PGFA.
Although GFA decomposes solution space, the iterative calculation of each dust particle is still executed serially. Therefore, we divide the initial dust population into several subgroups, and these subgroups are performed in parallel respectively, to improve the efficiency of GFA. The flow chart of PGFA is shown in Fig. 3.
The main steps of PGFA are summarized as follows:  Initiating dust population. Dust particles are initialized randomly, which is the same as the original GFA.  Being grouped randomly. This is indistinct from the process in original GFA. In PGFA, subgroups are stored separately and fed into different executors for subsequent iteration.  Parallel computation in subgroups. The parallel computation in subgroups consists of dust moving, absorbing and rotating, and then returns an optimal solution of current subgroup. When the current subgroup converges, the optimal solution is recorded. When the computation in a subgroup is completed, dust particles in current group will jump out of the intra-group loop without waiting for dust particles in other workers.  Summarizing the results. The dust particles of each subgroup will be integrated. The optimal results of all subgroups are also collected, and the optimal solution of the whole algorithm is picked among those results.  Grouping all dust particles again, a new round of iterations will be performed to avoid the algorithm falling into local optimum and ensure the accuracy of the solution. When the termination condition is satisfied, the algorithm ends.
According to the detailed description above, the pseudocode of PGFA is given in Algorithm 2.

Implementation of PGFA Based on the Parallel Architecture of MATLAB
The parallelism of GFA is essentially due to the independence of individual behavior in GFA. In this paper, we design and implement the PGFA based on the parallel topology of MATLAB with single cluster nodes.
In the topology structure of single cluster nodes, we start the parallel computing pool by "matlabpool", and use MATLAB client-worker mode to complete the PGFA. PGFA code starts up at the client and tasks are distributed to workers in parallel.

Divide Into Subgroups
According to the flow chart for PGFA, dust population should be initialized at client firstly. Then dust particles are grouped and distributed to multiple workers for parallel computation. In general, the number of workers is consistent with the number of CPU cores.
The number of subgroups is mainly depended on the number of workers opened by parallel computing pool. As shown in Fig. 4. the number of subgroups should not be greater than the number of workers in parallel computing pool. Without considering data communication, the maximum degree of parallelism will be achieved when the number of dust groups is consistent with the number of workers. Yet, a large amount of data communication will affect the efficiency of workers in this parallel structure.
During the parallel process, dust particles in each subgroup are updated independently, including the operation of movement, rotation and absorption, and then the optimal solution in subgroup is recorded. At the end of parallel computation of multiple subgroups, the optimal solution of each group is aggregated and the global optimal is selected by comparing with others. At the same time, all dust particles in each subgroup are aggregated too. Finally, the algorithm judges whether the number of iterations has been reached. If not, a new round of algorithm iteration will be carried out, and the newly acquired total dust population will be regrouped as the Fig. 3 shows, to avoid the local optimum.

The Process of Parallel Computation
To ensure the efficiency of PGFA, the process can be designed and implemented in parallel when two conditions are satisfied. One is the computations of one process are intensive. The other one is the loose coupling of data interaction. The computation in subgroups should be independent to reduce the time of communication. So, the initial dust data is set as sliced variable. In this way, it not only meets the requirements of parallel structure for variables, but also saves data transmission time.

Figure 4 Grouping to multiple subgroups
This shared sliced variable needs to be initialized outside the parallel part, because the variables inside the parallel part are temporary variables, whose values cannot be stored once the parallel process has been finished. The shared sliced variable stores the temporary optimal solution obtained by subgroups for each iteration. Then, the global optimal solution of the algorithm can be obtained by comparing all the optimal solutions stored in the shared sliced variable. The sliced variable as a medium pays an important role in data transmission between serial and parallel programs.

Improvement of Absorption Operation
In addition to improving the efficiency of GFA execution by means of multi-population parallel computation, this paper also makes some improvements on absorption operation.
In the original GFA, center dust will absorb the surrounding dusts which are particularly close to it. The absorption strategy of original algorithm is to delete surrounding dust directly. In the moving operation of original GFA, all the dusts move towards their center dust or generate a new center dust near the original center dust. But with the dust population converging to optimal solution, the surrounding dust particles which are far away from center dust are difficult to become the new center dust. Therefore, the surrounding dust particles which are too far away from center dust will not be a new center dust directly. In this paper, the improved absorption operation not only absorbs the dust particles close to center dust, but also absorbs the remote surrounding dust. There is a threshold as hyper-parameter to control the absorption area. The remote surrounding dust particles exceeding the threshold will be deleted directly. The process of improved absorption operation is illustrated in Fig. 5.
Additionally, the threshold cannot be too small. If the center dust is far away from the theoretical optimal solution and the threshold is small, the dust particles close to theoretical optimal solution will be absorbed and a pseudooptimal solution will be obtained.

EXPERIMENTS AND RESULTS
This part will introduce the details of experiments, including description of experimental platforms, setting of parameters, benchmark functions and experimental results. Finally, this part will give a simple analysis of experimental results.

Experimental Platforms
PGFA is implemented mainly based on the parallel toolbox in MATLAB. All the experiments were carried out on a computer with 4-core CPU and 8 GB memory. The computing platform in this paper for all the experiments is shown in Tab. 1.

Benchmark Function
To test the performance of PGFA in this paper, we select the following eight classical benchmark functions and compare the accuracy of PGFA with the original GFA.
x cos

Parameter Setting
In order to evaluate the experimental results and avoid the influence of environmental randomness on experimental results, 30 independent tests were run for each test function.
Based on the experimental computing platform showed in Tab. 1, four cores are accessible for parallel computing, so the dust particles are divided into 4 subgroups in this paper. The parameters and their values for the algorithms are given in Tab. 3. In this paper, the dimension of search space is set as 20, 50 and 100 respectively, and three groups of comparative experiments are tested on eight test functions. For each dimension, the initial number of dust particles is set as 5000. In order to compare the algorithm efficiency on different problems, we set the same maximum number of iterations as 4000.
Due to the limitation of computational resources, in order to avoid too long running time, different search spaces are set with different dimensions for PGFA and original GFA. Different test functions have different search spaces in three dimensions as shown in Tab. 4. Table 4 The search spaces of 8 benchmark functions

Benchmark Function
Range of variables 20 dims 50 dims 100 dims Sphere

Results and Discussion
The experimental results in different dimensions are shown in Tab. 5. It shows the average value of optimal solution and the average running time on 30 independent trials obtained by original GFA and PGFA in 20/50/100 dimensions.
In Tab. 5, "Mass" represents the optimal solution obtained by PGFA and original GFA, and "Time" denotes the running time of the algorithms. All results are the average value on 30 independent tests. The comparison of the results indicates that the efficiency of PGFA is generally better than original GFA. But with the increase of dimensions, the accuracy of the two algorithms is decreased. In order to show the experimental results better, the following Fig. 6 Fig. 7 are given according to the data from Tab. 5.
In order to normalize the data and improve the effect of data visualization. Fig. 6 Fig. 6, it can be seen that PGFA generally has smaller error. So we can conclude that PGFA has better accuracy than GFA, regardless of the dimensions. Fig. 7 obviously shows that the running time of the PGFA proposed in this paper is much less than that of the original GFA, whether in 20, 50 or 100 -dimensional search space. Vertical axis indicates second.
By comparing the running time of two algorithms, it can be concluded that the speed-up ratio of PGFA compared with original GFA is between 2 and 3, which shows the efficiency of PGFA under the multi-core environment. We also adopt cores usage one by one in 50 dimensions. The specific experimental data are listed in the Appendix (Tab. 1A). The speed-up ratio of each core in Tab. 6 is the average value of the speed-up ratio obtained by PGFA in 8 test functions. S p represents the speed-up ratio.  It can be seen from the table results that as the number of cores increases, the acceleration effect is better. However, the best speed-up ratio is lower than the number of cores (4 in this case). Failing to reach the ideal speedup ratio, because PGFA runs the programs serially at the stages of initialization and result summary.
Moreover, each core needs to exchange information with the client, which costs extra time. Therefore, it is difficult to get an ideal acceleration ratio that is equal to the number of cores. At the same time, this problem also suggests that the proposed PGFA still has potential improvement space: on the one hand, we can increase the proportion of parallel codes, on the other hand, we can reduce the data transfer between client and worker as much as possible. In the future work, we can implement GFA in a distributed way of multi-node cluster to achieve better efficiency.
In fact, this paper also tests the optimization results under different dusts numbers (N = 2500, N = 5000, N = 7500). The data is listed in the Appendix (Tab. 2A, Tab. 3A). It shows that PGFA always has less running time than GFA. As the number of dusts increases, both algorithms improve the accuracy of solution, and PGFA has better performance. Once again, it is proved that PGFA has better efficiency than GFA.

CONCLUSION
In view of the low efficiency of GFA in serial mode and the long-time of solving large-scale problems, this paper proposes a multi-population parallel GFA in multicore environment based on the parallel architecture in MATLAB. Multiple subgroups correspond to the core one by one. Meanwhile, the absorption strategy has been improved, based on the original GFA. To justify the performance of proposed PGFA, eight test functions in different dimensions are tested. The experimental results show that PGFA can make full use of the computing power of universal multi-core CPU and improve the efficiency compared with original GFA. At the same time, the parallel strategy and architecture proposed in this paper have good universality and can be adopted to improve the efficiency of similar optimization algorithms.