Parameter Solving of Probability Integral Method Based on Improved Genetic Algorithm

: The probability integral method (PIM) is the main method for mining subsidence prediction in China. Parameter errors and model errors are the main sources of error in the application of the probability integral method. There are many surface subsidence problems caused by coal mining. In order to improve the accuracy and operating efficiency of the genetic algorithm (GA) in calculating the parameters of the PIM, this paper proposes an improved genetic algorithm (IGA) by adding the dynamic crossover and mutation rate to the traditional GA. Made improvements to the shortcomings of random crossover and mutation rate of all individuals in the population in the original algorithm.Through simulation experiments, it is confirmed that the IGA improves the calculation efficiency and accuracy of the traditional GA under the same conditions.The IGA has higher accuracy, reliability, resistance to gross interference and resistance to missing observation points. This method is obviously superior to direct inversion and conventional optimization inversion algorithms, and effectively avoids the dependence on the initial value of the probabilistic integral method parameter.


INTRODUCTION
As domestic demand for coal remains high, it tends to be more and more important to minimize or even prevent the impact of mining subsidence on the production and living of mining areas [1][2][3]. To achieve this goal, it is necessary to make a reasonable prediction of the surface subsidence, which will occur during and after the mining process. It has important practical value in guiding disaster control, planning and forecasting in advance by disaster warning. This is the reason why we study the law of surface deformation caused by coal mining.
To predict the ground movement and deformation caused by mining precisely, an accurate prediction model is the most important thing. Liu Baocan and Liao Guohua proposed the PIM based on the theory of random media which is proposed by Litwiniszyn. The PIM is the most commonly used prediction model in China [4][5][6]. The reason why it can be widely accepted is because it has lots of advantages such as fewer parameters, simple model, accurate result and so on.
After the model is selected, how to accurately calculate the model parameters is another important thing [7]. The PIM is a mathematical model with a high level of nonlinearity. The expected parameters in the model mainly include the subsidence factor q, the horizontal movement factor b, the tangent of major influence angle tanβ, the propagation angle of extraction θ, the deviation of inflection point in four directions. Researches on the parameter calculation of PIM can be divided into four stages, which are linear approximation method, experimental design method, optimization algorithm and intelligent optimization algorithm.
The linear approximation method is to linearize the nonlinear function. The most representative method is the least squares algorithm. Although this method can calculate the parameters accurately, it has disadvantages such as initial value dependence and higher requirements of the arrangement of the observation station [8].
Orthogonal test design method is a method of conducting design tests through orthogonal tables and drawing conclusions by analyzing test data. This method can select a representative test from many tests. However, there are many influencing factors in the probabilistic integral model, which leads to a large amount of experimental work that needs to be designed, which reduces the efficiency of the method and is difficult to be implemented by a computer. Therefore, this method is not generalized [9].
The most widely used optimization algorithm is the model vector method. This method has a high resolution accuracy, but is too dependent on the initial value of the parameter [10][11][12].
Intelligent optimization algorithms are a class of algorithms that have emerged in recent years. These algorithms are based on the evolutionary rules or laws in nature, such as GA, particle swarm optimization, simulated annealing algorithms and so on [13][14][15][16][17][18][19].
Due to the principle and nature of the intelligent optimization algorithm and its improved algorithm, its parameter solving results have a certain randomness. Also, different algorithms have different convergence accuracy and convergence efficiency. Therefore, it is very important to find an appropriate method to solve the problems of unstable algorithm convergence results and high requirements on the initial value of parameters of recently used parameter solving methods.

PROBABILITY INTEGRATION METHOD 2.1 Establishment of the Surface Coordinate System
The principle of the PIM is the stochastic medium theory, which believes that a large number of granular media in the rock layer can move relatively with randomness. The subsidence of surface points is regarded as a weighted superposition of the influence of all the particles mined underground. The subsidence value of a surface point is directly related to the location of the point in the sinking basin. Therefore, it is important to establish a suitable surface coordinate system at first.
As is shown in Fig. 1, the Ao1B plane represents an underground coal seam and its depth is H. Due to the cantilever effect of the coal wall roof, the inflection point offset is defined in the PIM-the true shape of the surface subsidence basin is more in line with the prediction value calculated on the basis of ideal coal seam which is shown in Fig. 1.The CO2D plane in Fig. 1 represents the ideal coal seam, point O2 is the origin of the underground coordinate system, and the surface point O which corresponds to point O2 is set as the origin of the ground coordinate system.

Prediction of Sinking Value at Any Point
According to the random medium model, in the twodimensional case, assuming that there is a mining unit at point o2, the subsidence value of the surface point A(x, y) in the x-axis direction (i.e., the direction of strike) can be calculated by Eq. (1).
where r is the major influence radius of the mining unit, which is mainly related to the mining depth.
Assuming that the maximum sinking value is W 0 = mꞏqꞏcosα, the formula for calculating the subsidence value of the point with the abscissa of x on the main section is: The formula for calculating the subsidence value of the point with the ordinate of y on the main section is Due to the independence of the probability, the probability of causing subsidence at point A should be the product of the probabilities in different directions. The subsidence value of the surface point A(x,y) can be calculated by Eq. (2).
where W 0 represents the max value of the surface subsidence, given by 0 cos W m q     ; m is the thickness of the coal seam; q is a parameter of the PIM named subsidence factor; α is the angle of the coal seam; l is the strike length of the ideal coal seam, given by 3 3 4 l D S S    , where D 3 is the length of the real coal seam, S 3 and S 4 are two parameters of the PIM named the deviation of inflection point on the left border and the deviation of inflection point on the right border; L is the inclination length of the ideal coal seam, given by where D 1 is the inclination length of the real coal seam; S 1 and S 2 are two parameters of the PIM named the deviation of inflection point in the direction of downhill and the deviation of inflection point in the direction of uphill; θ is a parameter of the PIM named propagation angle of extraction; r is a parameter of the PIM named the major influence radius, it is also replaced by another parameter tanβ named the tangent of major influence angle in actual application, given by tan r H /   .

Parameters to be Solved
It can be seen from the formula that the PIM to predict the surface subsidence value mainly includes 8 parameters: (1) The subsidence factor q Suppose the mining thickness is m. Since the roof rock fills the mined-out area in the form of falling and shattering, the subsidence of the roof will be less than m. Assuming that the sinking value is M q  , q is called the subsidence factor, and its value is generally less than 1. In addition, if the influence of coal seam dip angle α is considered, the subsidence value of the roof shall be cos M q    . q represents the ratio of the maximum surface subsidence value W 0 to the projection length of coal seam thickness in lead hammer direction cos M The value range of q is generally: 0.4 ~ 1.0.
(2)The tangent of major influence angle tanβ. tanβ is the ratio of the mining depth to the major influence radius. The major influence radius is the distance from the inflection point of the main section subsidence curve to the maximum subsidence point or the distance from this inflection point to the boundary point of the moving basin under full mining conditions. The value range of tanβ is generally: 1.2 ~ 2.6.
(3)The propagation angle of extraction θ and the deviation of inflection point S. When the mined coal seam is inclined, the inflection point of the actual subsidence curve is not located directly above the coal wall, but is offset in the downhill direction. The included angle is the propagation angle of mining influence. The offset distance between the inflection point and the surface point corresponding to the coal wall is called the deviation of inflection point. The value range of θ is generally: where α represents the dip angle of the coal seam. The value range of S is generally: where H represents the depth of the coal seam.

Building an Error Model
To evaluate the accuracy of a set of parameters, an error model needs to be established. The larger the error, the weaker the adaptability of the group of parameters. In practical applications, we can usually choose to establish an error function based on the sinking value or the horizontal movement value or the combination of sinking and horizontal movement. According to the theorem of least-squares, the squared difference between the estimated subsidence value and the measured subsidence value is used as an indicator to establish the error function model. In order to make the individual with a large function value corresponding to a good fitness, take its reciprocal as the final result as is shown in Eq. (6).
where W Inversed is predicted subsidence value and W Measured is measured subsidence value.

THE IGA AND SIMULATION EXPERIMENT 3.1 GA
The GA was first proposed by Professor Holland of the United States in 1975. It is a population optimization algorithm proposed by simulating Darwin's "evolution theory" and Mendel's "genetic theory". It is a probabilistic optimization method, which can automatically obtain and guide the optimized search space, adjust the search direction adaptively. It does not require certain rules, has low restrictions, and has strong scalability. The processing object of the GA is a population with potential genetic information. And the genetic information is recorded in the genes of each individual in the population. Each individual in the population is actually a chromosome with some information of a solution. According to the principle of "survival of the fittest", generational evolution produces better and better approximate solutions. Like the phenomenon of natural evolution, individuals who are more adaptable to the environment will be produced in the new generation and those who are not adapted to the environment will be eliminated by selection so that the optimal individual will be produced. Then an approximate optimal solution to the problem can be obtained by decoding the individual eventually.
By simulating the genetic inheritance of organisms, the main biological principles of the GA are three basic genetic operators: selection, crossover and mutation. Through a large number of selection and genetic operations within the population, the solution of the problem can gradually approach the optimal.
The following will specifically describe the calculation process of the GA: (1) Set relevant parameters and generate the initial population: set the evolution counter as t = 0, the maximum evolution number as T, the mutation rate as pa and the crossover rate as pb. Generate each individual in the initial population using a method that is randomly generated within a reasonable range and record the initial generation as P(0) and the t-th generation as P(t). Declare an individual Ibest and its fitness best to record the best individual in the current iteration process.
(2) Calculate the fitness of each individuals of current generation: Calculate the fitness of each individual with Eq. (6). Record the best fitness individual in the contemporary population as Ibest(t), and its fitness value as best(t). If best > best(t), then best = best(t), Ibest = Ibest(t). Otherwise, the optimal solution remains unchanged.
(3) Judge whether the termination condition is reached: if the number of iterations t = T or the fitness of the optimal individual best has reached the requirement, the program can stop. The solution result is the Ibest. Otherwise continue to the next step.
(4) Selection operation: Apply the selection operator to current generation. The operation of selecting highquality individuals and eliminating inferior individuals from current generation is called selection. This step embodies the natural law of "survival of the fittest", which is to determine the pros and cons of an individual according to the calculated fitness of the individual. Individuals with good fitness are regarded as high-quality individuals and will be directly retained for the next generation. Otherwise, inferior individuals will be eliminated or retained after the following crossover and mutation operations. The most commonly used selection operators are: fitness proportional method, random traversal sampling method and local selection method.
(5) Crossover operation: apply crossover operator to current generation. This step is the soul of the GA. Its specific operation method is to allow neighbouring individuals to randomly exchange genes. Therefore, this approach can strengthen the exchange of information between individuals, thereby achieving the effect of accelerating the speed of convergence while improving the accuracy of calculation.
(6) Mutation operation: apply mutation operators to current generation. The same as the crossover operator, the mutation operator can also improve the convergence speed and calculation accuracy. But there is no information exchange between individuals during variation. Instead, it randomly selects a group of genes in inferior individuals for mutation, that is, replaces one or some of the parameters with randomly generated numbers within a reasonable range.
After the t-th generation population has undergone a series of operations of selection, crossover, and mutation, the (t + 1)-th generation can be generated, which is denoted as P(t + 1).Then go back to step (2) to calculate the fitness value of individuals in the next-generation population.
The calculation steps of the original GA are shown in Fig. 2. Technical Gazette 28, 2(2021), 515-522 From the above process, it can be seen that the crossover probability pa and the mutation probability pb of the GA are fixed values, which indicates that the optimal solution will cross over or mutate with the same probability. If the crossover or mutation rate is too large, it will lead to too more solutions that need to be updated, resulting in difficulty in converging to the optimal solution. Conversely, if the crossover and mutation rate is too small, it will lead to less solutions to be updated, which will make it difficult to find the optimal value efficiently.

IGA
Based on the defects of the GA mentioned above, variable crossover rate and mutation rate are proposed in this paper. The probability is large enough to ensure the diversity of the algorithm in the early time. Then in the later stages of the iteration, the probability needs to be adjusted to an appropriate value. Assuming that the crossover and mutation probability are the same throughout the whole iteration process, the maximum crossover and mutation probability is P max and the minimum mutation probability is P min . Then the formula for dynamically adjusting the probability P is as shown in Eq. (7). max min ( 1) 1 where P max is the maximum evolution probability, P min is the minimum evolution probability, N is the maximum genetic generation, and n is the current genetic generation.

Simulation Calculation
Simulation calculation is an effective method to verify the correctness of the parameter solving algorithm. So a simulation workface is designed to verify the reliability of the IGA in this paper.
The Take the subsidence value calculated by the PIM with the initial parameters' value as the true value. Then make them more like real observation data by adding a certain proportion of random error, accidental error, and gross error. The observation data of this simulation work face is shown in Tab. 1. Set the number of individuals in one generation as 200, the number of evolutionary generations as 200, the crossover rate, the mutation rate as 0.75, the max mutation rate P max = 0.9 and the minimum mutation rate P min = 0.5.
Calculate 50 times with the GA and IGA and take the average value as the respective result, the calculation results are shown in Tab. 2.   Figure 4 Comparison chart of the measured subsidence value and fitted subsidence value of the strike main section of the simulated working face After calculation, the error in the fitting of the GA is 34.8%, and the error in the fitting of the IGA is 28.4%. The data in the table shows that the IGA can be used to solve the parameters to obtain reliable results. Compared with the parameter results solved by the traditional GA, the calculation result of the IGA is more stable, and the result is closer to the true value. The comparison between the simulated real value of the simulation work surface and the estimated sinking value of the parameter value solved by the IGA is shown in Fig. 4 and Fig. 5.

Parameter Inversion
From the current situation of the parameter inversion of the probabilistic integral method, it is known that the currently feasible main parameter inversion methods include: methods based on experimental design, optimization algorithm and intelligent optimization algorithm. Because the experimental design method requires a lot of manual intervention and the algorithm is not very popular, it is not compared in this paper. This paper compares the difference among modular vector-the most representative of optimization algorithms, GA-the most representative of intelligent optimization algorithms, and IGA-proposed in this paper. The parameter inversion of the PIM is in combination with the measured data from the surface mobile monitoring station at 14141 face of coal mine in mining area. Considering that the modular vector method is related to the initial value of the iteration, a total of five inversion calculations based on different initial values has been carried out for the project instance data. All parameters can be changed freely during the calculation process without adding artificial interference. Considering that the calculation results of the GA and IGA have a certain floatability, this paper uses the method of changing the number of iterations to check the stability of the method. Tab. 3 shows the comparison of the inversion results of the subsidence factor q. Other parameters have similar conclusions, which are not listed here.The comparison between the measured value and the value fitted by GA and IGA is shown in Fig. 7 and Fig. 8.  It can be seen from Tab. 3 that the modular vector inversion results have greater volatility, the inversion results are strongly dependent on the initial values, and the subsidence factor inversed by the GA is relatively stable. The IGA has the best stability. The three inversion results are mutually independent. The maximum difference of parameters inversed by IGA is 0.09, and its stability is significantly better than the other two algorithms. Similarly, the sum of squared residuals of the IGA inversion results is also relatively stable, but it is not completely superior to the modular vector method. There are model errors in the measured data. At the same time, the GA and IGA pre-determine the parameter range so that it cannot lose its physical meaning. Random changes are the main reasons for this phenomenon. Due to the unreasonable initial value setting, iterative divergence appears in the modulus loss method, resulting in poor accuracy of the parameter inversion results. The GA solves the shortcomings of the initial value dependence of the mold loss method to a certain extent. The IGA method has improved stability and calculation accuracy compared to the GA. From the perspective of nature, it is considered that this method better solves the parameter inversion problem of the PIM.

CONCLUSION
The PIM is an important mathematical model for the estimated subsidence value in the field of mining subsidence in China. The accuracy of the parameters in the model determines the accuracy of the predicted subsidence value. So the prediction parameters inversion plays a vital role in preventing disasters in the mining area.
The GA has many advantages in solving the parameters of the PIM, but there are still many deficiencies. Aiming at the disadvantages of unstable calculation results and precision to improve, this paper proposes the concepts of the IGA. Its feasibility has been verified by simulation data experiments and a real working face in Huainan.