Automated Test Case Generation Based on Competitive Swarm Optimizer with Schema and Node Branch Archive

: Software testing plays an important role in the software development life cycle, among which automated test case generation (ATCG) technology is widely concerned because of its low cost and high degree of automation. In the process of using search-based algorithms to solve the automated test case generation for path coverage (ATCG-PC), how to minimize the generation of redundant test cases under the premise of 100% path coverage has always been a challenge. Inspired by improving the search ability of the search-based algorithm itself and the prior knowledge in the field of ATCG-PC, we propose a competitive swarm optimizer with schema and node branch archive (SNBAr-CSO) algorithm to solve the problem of complex test case generation with multiple variables in nodes. On the basis of competitive swarm optimizer, this algorithm uses the prior knowledge of schema to find all variables that affect the direction of a node branch quickly, and uses node branch archive to record the relationship between node branch direction and variable value. The experimental results of 12 practical programs on iFogSim and CoreNLP show that compared with other newly proposed algorithms, SNBAr-CSO can greatly reduce the number of redundant test cases under the premise of 100% path coverage.


INTRODUCTION
As an important part of the software life cycle, the purpose of software testing is to find possible defects in software products [1].Functional testing and structural testing are the most common software testing activities.The focus of functional testing is to meet the needs of the program, while structural testing focuses on revealing the logical defects of the program [2].Automated test case generation (ATCG) technology is white box testing, it belongs to a kind of structural testing.Compared with the traditional manual method, it can save a lot of human resources and find more logical defects contained in software products [3].
The main goal of ATCG is to generate a set of test cases that meet specific coverage criteria.At present, the commonly used coverage standards include branch coverage, statement coverage and path coverage.Among these coverage criteria, path coverage is considered to be the most stringent coverage criterion because it requires that each possible path of the program under test be executed at least once [4].In particular, the test case set that meets the path coverage can also cover all branches and statements [5].Therefore, test case generation for path coverage (ATCG-PC) has become a hot spot in the field of ATCG.
The common solution methods of ATCG-PC include random method [6], symbolic execution method [7], program instrumentation method [8] and search-based algorithms [9].The random generation method randomly generates a large number of test cases in the space of the definition domain of problem variables.Although this method is simple to implement and easy to quickly generate a large number of test cases, the test case set generated is very large and the path coverage is low.The symbolic execution method uses symbolic expressions to replace the corresponding variables in the path as program input, and generates test cases by solving relevant constraints.Because the computational complexity of this method is exponential with the constraint expression in the measured program, it cannot solve the practical problem well.The program instrumentation method obtains the test value by inserting the test statement set into each judgment statement and loop statement in the program.Due to the direct instrumentation of the source code, this method has high accuracy and pertinence, but the instrumentation operation workload is large and pollutes the source code.The search-based algorithm transforms ATCG-PC into a combinatorial optimization problem, and guides individuals to cover all paths through the experience learned in the search process.Because search-based algorithms have few constraints on the problem to be solved and do not rely on gradient information, searchbased testing has become a new trend to solve ATCG-PC.
Researchers propose a variety of methods to solve ATCG-PC based on search-based algorithms.They can be roughly classified into two different categories.
The first category focuses on improving the global or local search ability of ATCG-PC algorithms.For example, Wang et al. [10] and Suresh [11] used genetic algorithm (GA) to solve ATCG-PC.On the basis of GA, Yao et al. [12] proposed a multi-population genetic algorithm with individual sharing, so as to more efficiently generate test cases satisfying path coverage.The above algorithms focus on improving search-based algorithms themselves.In addition, the design of fitness function can help searchbased algorithms find more paths.Lin [13] proposed a GA based on the fitness function of Hamming distance to generate test cases covering the target path.Huang et al. [14] used an adaptive fitness function to improve the fitness of test cases with uncovered nodes.Sahoo et al. [15] proposed a value combination branch distance function based on path coverage criterion, and combined it with particle swarm optimization (PSO) algorithm to automatically generate test cases.
The second strategy uses the specific knowledge of ATCG-PC to help reduce the search space.For example, Huang et al. [16] proposed a differential evolution with relational matrix (RP-DE).Through the relational matrix, the relationship between the dimension of test case variables and nodes can be found, thus avoiding a lot of redundant search processes.Dai [4] proposed a node branch archive (NBAr) based on the relationship matrix, which further reduced the search space.Liu et al. [5] designed a manifold inspired search algorithm (MISA), which reduced the search domain by equivalent mapping subspace.Gong et al. [17] proposed a target path grouping strategy, in which all target paths are divided into multiple groups, and the paths belonging to the same group are highly similar to each other, so that the targets in the same group can be quickly covered.
To sum up, we note that in addition to adopting more novel search-based algorithms or designing better fitness functions, the specific knowledge in the field of ATCG-PC can better improve the efficiency of the algorithm in solving ATCG-PC.Inspired by this idea, we find that for a complex ATCG-PC problem with multiple variables in a node, if we can understand all the variables affecting the node direction in advance, we will obtain competitive performance on this kind of problem.Therefore, this paper combines the above two improvement strategies and proposes a competitive swarm optimizer based on schema and node branch archive (SNBAr-CSO).Among them, competitive swarm optimizer algorithm (CSO) [18] is a competition search-based algorithm with good search ability.Schema is a kind of prior knowledge designed for a complex ATCG-PC problem with multiple variables on a node.Node branch archive can reduce a lot of search space by recording the relationship between nodes and variables.
The main contributions of our work are as follows: -Aiming at the complex ATCG-PC problem with multiple variables in nodes, this paper deeply analyzes and puts forward a priori knowledge of schema, which is helpful to quickly find out all variables affecting nodes.-A strategy combining schema and node branch archive (SNBAr) is proposed.Schema and node branch archive can efficiently find the relationship between test case variable dimensions and nodes.The combination of these two strategies can reduce a lot of unnecessary search space.-Competitive swarm optimizer with schema and node branch archive (SNBAr-CSO) is proposed.The SNBAr-CSO is compared with other state-of-the-art algorithms, and the experimental results show that the performance of our algorithm is better than the latest solution of ATCG-PC.The rest of this paper is organized as follows.Section 2 introduces the preliminary knowledge of search-based algorithms for solving ATCG-PC.Section 3 introduces CSO and NBAr and analyzes the advantages and improvements of NBAr.Section 4 introduces the details and overall process of SNBAr-CSO.In Section 5, the comparison between SNBAr-CSO and other algorithms and the separation experiments under different strategies are discussed.A brief summary of this paper is presented in Section 6.

BACKGROUND
This section introduces the preliminary knowledge of search-based algorithms for solving ATCG-PC, including basic concepts, mathematical model, fitness function and several common search-based algorithms.

Basic Concepts of ATCG-PC
According to the definition of [19], there are some basic concepts for solving ATCG-PC.where N represents the node set, the edge set E = {< n 1 , n 2 >| n 1 , n 2  N} represents an edge from node n 1 to n 2 , s and e are the start and end nodes of the program respectively.
Definition 2 (Node): Node represents the basic block in the program under test, and the statements in the basic block are executed sequentially.A conditional node is a node that contains predicates, and each of its output edges is called a branch.
Definition 3 (test case): A test case is a vector X i = (x i1 , x i2 , …, x in ) with n dimensions, in which the dimension is composed of variables in the program under test.
Definition 4 (path): A path p i <s, n 1 , …, n k , e> is a sequence composed of at least start node and end node, which represents the actual execution of the test case in the program under test.
Definition 5 (path coverage): If the path passed by test case X i in the program under test is p j , X i covers the path p j .
According to the above definition, the path can be represented by a sequence of nodes.Determining which condition nodes the path passes through and the branch direction on these condition nodes can uniquely determine the path.In order to more concisely represent a specific path, we only use conditional nodes to encode it, in which non null characters indicate that the test case passes through the node, null characters indicate that the test case skips the node, and specific branch directions can be represented by characters 0-9.For the real variable in the program to be tested, its corresponding value can be directly encoded.String variables can be represented by ASCII codes composed of corresponding characters.As shown in Fig. 1, the program includes 4 variables (a, b, c, d where a, b, c, belong to real variables and d belongs to string variables) and 8 nodes (s, e are start and end nodes, ②, ④, ⑥ belong to ordinary nodes and ①, ③, ⑤ belong to branch nodes).We use '0' and '1' to represent "No" and "Yes" branches of the branch node respectively.For path <s, 1, 2, 3, 4, 5, 6, e>, its condition nodes are ①, ③, ⑤, and the branch directions on these condition nodes are all "Yes".Therefore, its path code can be expressed as "111".If the test case X i needs to cover this path, its variable value should be a = 1, b = 4, c = 10, d = "mod", and the corresponding variable code of X i = (1, 4, 10, 109, 111, 100).

Mathematical Model of ATCG-PC
In order to better describe the ATCG-PC problem, researchers have proposed a variety of mathematical models.Fraser et al. [20] proposed a test suite model to minimize the redundant test cases under the condition of meeting the coverage criteria.Huang et al. [14] designed an adaptive evaluation model that is conducive to including the uncovered nodes.Because models [14] and [20] cover all target paths at the same time, these models are not helpful when it is necessary to cover a specific target path.In addition, ATCG-PC is also modelled as a multiobjective problem.Mala et al. [21] established a mathematical model with two objectives, which considers both maximizing the number of path coverage and minimizing the number of test cases.In the multi-objective model, each test case needs to evaluate all the remaining uncovered paths, which leads to a lot of function evaluations consumption.
The mathematical model in this paper is based on [16].The model considers both the target path coverage problem and the problem including the path that cannot be covered.Its solution goal is to maximize the path coverage under the condition of the maximum acceptable number of testcases Mcn.The specific description of the model is as follows: , 1 1 min 1, , 1, If test case covers path 0, otherwise Eq. ( 1) represents the calculation method of path coverage rate c, where l represents the number of paths covered by the generated test case, L is the total number of paths in the program under test.Eq. ( 2) and Eq. ( 3) define the calculation method of l, where , 1 T i j i   counts whether the generated test cases cover the path p j , and when multiple test cases cover the same path, the value of l is accumulated by one.Eq. ( 4) represents the maximum acceptable number of test cases generated T.

Fitness Function
When the search-based algorithms are used to solve ATCG-PC, the fitness function is used to evaluate the quality of the generated test cases.By comparing the fitness values, the appropriate individuals are selected as the offspring in each iteration, so as to effectively guide the algorithm to find the test case set covering all paths.Based on the selected mathematical model, this paper uses the branch distance method [14] as the fitness function, which is widely used in ATCG-PC.The calculation method is as follows: where m represents the number of nodes contained in the program under test, f p j represents the fitness at j-th node,   j i BD X reflects the deviation between the actual branch predicate and the required branch predicate of the test case X i on the j-th node, ε is a constant to avoid divisor 0. Tab. 1 lists the commonly used branch predicates and their corresponding branch distance function calculation methods.
Search-based algorithms are a mature global optimization method.Because of its high robustness and wide applicability, it is widely used to solve ATCG-PC problems.DE generates new candidate solutions through the differential and crossover operations of individuals in the parent population.IGA combines the advantages of immune theory and basic genetic algorithm.It not only retains the search characteristics of GA, but also makes use of the multi-mechanism of immune algorithm to find the optimal solution of multi-objective function.ABC is composed of three basic elements: food source, employed bees and unemployed bees.It finds the global optimal value through the local optimization behaviour of individual artificial bees.PSO guides the population evolution by recording the global optimal solution and the optimal solution of all individuals.Different from PSO which records individual optimal solution and global optimal solution, CSO introduces a pair of competition mechanisms, in which the particles that lose the competition will update their position by learning from the winner, so as to achieve a good balance between exploration and development.The operation process of CSO will be described in detail in Section 3.

COMPETITIVE SWARM OPTIMIZER AND NODE BRANCH ARCHIVE
In this section, we will describe the operation process of CSO algorithm and node branch archive in detail.

Competitive Swarm Optimizer
The competitive swarm optimizer (CSO) [17] is a group optimization algorithm for large-scale optimization.Based on the particle swarm optimization algorithm, the algorithm introduces the pairwise competition mechanism.The failed particles update their positions by learning from the winner.CSO algorithm is used in a large number of practical problems because of its simple structure and good balance between exploration and development.
In the CSO algorithm, for each particle in generation t, its position and velocity are expressed as: ..., pop where i represents the population size, n is the dimension of each individual variable and pop represents the individual size of a population.

Initialization
During the initialization of CSO algorithm, each individual in the population will be assigned a position of random value in the definition domain, and the initial velocity of all particles is 0. The specific operation is shown in Eq. (8).
where rand[0, 1] represents the random number in the range of 0 to 1, and j ub and j lb represent the upper and lower limits of the variable on the j-th dimension.

Learning Operator
In order to make the algorithm converge to the global optimal solution, CSO introduces the pairwise competition mechanism in the population iteration process.The individuals in the group will be divided into k pairs, k = 1, 2, …, pop/2, and the individual in the same pair will compete.The losers in the competition will update their speed and position by learning from the winners.The formula is as follows: , where l and w respectively represent the individuals who lose or win in the competition, R 1 , R 2 , and R 3 , are random numbers between (0, 1), k X means the average position value of relevant particles, and φ is the parameter controlling the influence of k X .

Node Branch Archive
The node branch archive (NBAr) can record the relationship between the dimension of test case variables and the node branch direction in the search process, and reduce a lot of search space through this relationship.Next, we will further introduce the specific process of NBAr and put forward some improvement suggestions for its existing problems.
In the process of test case generation, variables with specific values can make the test case pass through a specific direction on a node.As shown in Fig. 1, node a == 1 is only related to variable , and when the variable value is a = 1 the direction of the node is "Yes".Therefore, if the direct relationship between variable values and nodes can be found, the search for irrelevant dimensions can be avoided.
NBAr uses two matrices A and R to record the relationship between variables and nodes found in the search process.The matrix R can record the correlation between variables and nodes, that is, which variables determine which nodes directions, thus reducing the search process for unnecessary dimensions.The matrix A records the relationship between the variable value and the node direction, that is, what value should be taken when the node is in a specific direction.If the variables that most affect a node can be found by matrix R, we can try to find the value of the variables that affect the direction of the node by matrix A.
NBAr mainly records the relationship between test cases and nodes through two matrices to reduce the search space.However, the strategy can only find one variable value that affects the direction of a node at a time, so it cannot efficiently solve complex ATCG-PC problems where the direction of a node is determined by multiple variables.If we can find all variables and their values that affect the direction of the node at one time, we can greatly reduce the search process of irrelevant variables and variable values.

OUR APPROACH
In this section, a competitive swarm optimizer with schema and node branch archive (SNBAr-CSO) will be proposed.We will introduce SNBAr in detail and the whole process of applying SNBAr to CSO.Finally, an example is given to illustrate how SNBAr-CSO can solve the multivariable ATCG-PC problem.The variables used in SNBAr-CSO are shown in Tab. 2. The schema corresponding to the ith node p target The selected target path p Xi j The jth character in the path encoding string of p Xi n The number of variable dimensions m The total number of nodes pop The individual size of a population dim The selected optimized dimension

Schema
As shown in Section 3, node branch archive can reduce a lot of search space by recording the relationship between test case variables and node branch direction.However, NBAr can only find one variable that affects the direction of node branch at most each time, which is not conducive to solving programs with multiple variables in nodes.Therefore, we propose a priori knowledge of schema.The combination of schema and node branch archive can better solve the complex ATCG-PC problem with multiple variables in nodes.
Schema is a set of all variables that affect the branch direction of the same node.For a program to be tested, its schemas set can be expressed as: where m represents the number of nodes contained in the program.Each schema corresponds to each branch node in the program under test.For the i-th node, the schema of this node can be expressed as: , ,..., where k is the number of variables contained in S i .The value of these variables will affect the branch direction of the node.We can directly find all the variables that affect the direction of the node through the schema of the corresponding node.The schema of the program shown in Fig. 1 is shown in Tab. 3.

Dynamic Target Path Selection Strategy
When solving the ATCG-PC single path coverage problem, the search-based algorithm covers the uncovered paths one by one through multiple iterations, so the selection order of target paths will affect the solution performance of the algorithm.In order to select a more suitable target path, we adopt a dynamic target path selection strategy based on similarity.
For each individual X i in the population, we choose the most suitable target path p target from the paths without coverage.Compared with other paths, the path coding of X i p is the closest to p target .Algorithm 1 shows the selection process of p target .We use w to record the similarity between X i p and each path.If p j has been covered, w j is assigned to 0. If p j is not covered, w j records the number of similar codes in X i p and p j path codes.The higher the similarity is corresponds to the path closest to X i p .When all the paths are compared, we use roulette-wheel selection based on w to select the closest path as p target .

Search Process of SNBAr
As shown in Algorithm 2, the overall search process of SNBAr can be roughly divided into two parts: (1) search based on schema and matrix A; (2) search based on scatter search strategy.
After the p target is determined through Algorithm 1, we compare each value of the path codes of X i p and p target .As shown in line 5 of Algorithm 2, if the values of X i p and p target are different on j-th node, it means that the directions of X i p and p target on j-th node are different.As shown in line 7 -11 of Algorithm 2, we judge whether matrix has recorded the value of the variable that can make the direction of X i on j-th node the same as that of p target .If the values of these variables can be found, the values of the variables contained in S j will be assigned X i .In this way, we can reduce the search space and a lot of fitness evaluations consumption by using the schema and the relationship between variables and nodes.
When the schema and matrix A cannot be used to reduce the search space, we use scatter search strategy [26] to search each variable in S j .The scatter search strategy can select the location of each search according to fitness, and gradually reduce the search step to find the individual with the best fitness.
In order to ensure the effectiveness of matrix A, every time a new path is covered in the search process, the matrix A needs to be updated according to Algorithm 3.

Initialize and evaluate individuals;
Record the path that has been covered.

The programto be tested c ≠ 100% and T < Mcn
Using CSO to search; Evaluate individuals; Record the path that has been covered.

Test suite
(1)Use Algorithm 1 to select the target path; (2)search based on schema and matrix A; (3)search by discrete search strategy Figure 2 The framework of SNBAr-CSO

The Framework of SNBAr-CSO
As mentioned in Section 2, search-based algorithms are widely used to solve ATCG-PC problems.In this paper, SNBAr is applied to CSO, and the overall process of SNBAr-CSO is shown in Fig. 2.
Firstly, each individual in the population is initialized randomly within the value range of each variable to form the initial population, and then evaluate each individual through fitness function, and record the path covered at this time.
When the initial population is generated, the algorithm goes into the global and local search stages.In the global search stage, CSO is used to guide the whole population to generate a new offspring population, and the individuals with better fitness in the offspring population will be retained.The purpose of the local search phase is to generate test cases covering the target path.In this phase, SNBAr is used to reduce redundant search process.When SNBAr fails, scatter search strategy is used to search.
SNBAr-CSO combines the global search domain with local search.On the basis of fitness function, CSO is used to make the whole population select the individuals with the largest number of uncovered nodes.At the same time, local search strategy is used to solve the coverage problem of the target path.
Suppose that after initializing the population and CSO operation, the covered path is p 1 , and the remaining uncovered path set is P remind = {p 1 , p 2 , …, p 8 }.In this case, the test case to be optimized is X i = (3,5,10,109,117,103).According to Algorithm 1, the target path p target is selected as p 2 .Because the direction of path p 1 covered by X i p is different from that of p target on node⑤, our goal is to make X i cover p 2 by changing the values of variables that affect the direction of node⑤.Since matrix A is null at this time, scatter search strategy is used on the variable dimension contained in S 3 .If the new offspring in the search process covers p 2 , matrix A is updated according to Algorithm 3. Since the branch direction of p 2 on node⑤ is "Yes", the updated matrix A contains the values of all variables that make the branch direction of node⑤ is "Yes".
In the next search process, if X i = (1, 4, 10, 103, 110, 123), the path X i p covers is p 7 , and the target path is p 8 , then the direction of X i p and p target on node⑤ is different.
Since matrix A has recorded the value of the variable that makes node⑤ direction "Yes" in the previous search process, matrix A can directly assign values of variables contained in S 3 to X i , so that X i directly covers p 8 .

EXPERIMENT AND RESULT ANALYSIS
In this section, we will verify the performance of the proposed SNBAr-CSO through comprehensive experiments.Firstly, the basic settings of the experiment, including benchmark programs, evaluation indicators, experimental environment and parameter setting will be introduced, and then the advantages of SNBAr-CSO will be proved through three experiments.

Experimental Setup
In order to better verify the performance of SNBAr-CSO, we selected 12 benchmark programs from iFogSim [27] and CoreNLP [28].Among them, iFogSim is a simulation development kit, which uses the mode of sensor, process, driver and distributed data flow to simulate the application scenario in fog computing environment.CoreNLP is a set of natural language analysis tools written in Java provided by Stanford.These 12 benchmark programs have been used by many researchers such as Dai [4], Liu [5]and Huang [14] in the experimental study of ATCG-PC.The detailed information of these benchmark programs is shown in Tab. 4.
In the experiment, the following performance indicators were used to compare the performance of different algorithms.
1) Ave.T: The average of test cases generated in 30 independent executions.The smaller the average value, the higher the efficiency of the algorithm.
2) Ave.c:Average path coverage rate of 30 independent executions.The higher the path coverage, the more paths the algorithm covers.
This paper designed a total of three groups of experiments, the specific content of which is as follows: 1) SNBAr-CSO will be compared with some newly proposed algorithms.Specifically, these algorithms are GPE-IS [29], MISA [5], NBAr-DE [4] and RP-DE [16].
3) The separation experiments of SNBAr-CSO under different strategies will be discussed.
The purpose of the first experiment is to discuss the performance of SNBAr-CSO and the newly proposed algorithms.The second experiment is to analyze whether the SNBAr improves the performance of search-based algorithms for solving ATCG-PC problems and which search-based algorithms perform better in combination with SNBAr.The third experiment is to discuss the performance of SNBAr-CSO in the case of different strategies chosen.To make the results of the experiment more convincing, all the algorithms were run in the same experimental environment.The experimental computer was configured with Intel i7-7700 3.60 GHz, 16 GB, and Windows 10 OS.The population size in the experiment was set to pop = 50, and the maximum number of test case generations introduced was Mnc = 3.00E+05, with significance testing based on Wilcoxon rank-sum testing [31] with α = 0.05.Experimental data were averaged from 30 independent runs.The values of the parameters in the compared algorithms are all the same as in the references [4,5,14,21,25,29,30], and their specific values are shown in Tab. 5.

Comparison with NBAr-DE and Some Other Algorithms
In order to verify the efficiency of our proposed SNBAr-CSO, we compare SNBAr-CSO with the latest algorithms GPE-IS, MISA, NBAr-DE and RP-DE.The experimental results are shown in Tab. 6.The best results among SNBAr-CSO, GPE-IS, MISA, NBAr-DE and RP-DE are highlighted in bold.These symbols "+/=/-" indicate that our method is superior to, equal to or inferior to GPE-IS, MISA, NBAr-DE and RP-DE, respectively.These marks have the same meaning in other tables.
As shown in Tab.6, because the benchmark function No. 2 contains unfeasible path, all algorithms cover only 67% of the paths.The results of SNBAr-CSO on benchmark programs No. 1, No. 4, No. 5, No. 6, No. 7, No. 8, No. 9 and No. 12 are better than GPE-IS.Only the test case consumption of No. 10 is greater than GPE-IS.Compared with MISA, except that the test case consumption of SNBAr-CSO and MISA on benchmark programs No. 1 and No. 5 is similar, the statistical results of SNBAr-CSO on most benchmark functions are better than that of MISA.On benchmark programs No. 9 and No. 12, the test case consumption of SNBAr-CSO is one order of magnitude less than that of MISA.
In the comparison between SNBAr-CSO, NBAr-DE and RP-DE, the number of test cases generated by SNBAr-CSO is significantly less than that of NBAr-DE and RP-DE.SNBAr-CSO maintains 100% path coverage on all test functions, while NBAr-DE covers only a few paths on benchmark functions with No. 9, No. 11 and No. 12, RP-DE covers only 18% of the path on No. 12.This is because these three benchmark functions contain variables of multiple dimensions, and they contain many nodes composed of multiple variables, so the search space is very large.It can be seen that the introduction of schema reduces a lot of search space.
In conclusion, compared with GPE-IS, MISA, NBAr-DE and RP-DE, SNBAr-CSO can cover all paths with less test case consumption.Its advantage lies in the combination with schema and NBAr to quickly find all variables and their values that affect the direction of nodes, which can reduce a lot of search space.Compared with MISA and NBAr-DE, SNBAr-CSO is more suitable for solving complex multivariable ATCG-PC problems.By comparing SNBAr-CSO with the variant based on the combination of search algorithms and SNBAr, we can see that SNBAr-CSO has the best statistical results on most benchmark functions from Tab. 7 and Tab. 8.In the comparison between SNBAr-CSO and SNBAr-ABC, SNBAr-CSO achieved better results in all programs except No. 2 program.In the box plots shown in Fig. 3, it can be seen intuitively that SNBAr-CSO consumes less test cases than other algorithms, and the results obtained are more concentrated.This shows that CSO can combine SNBAr strategy better than other search-based algorithms.From the above results, we can conclude that SNBAr can improve the ability of search-based algorithms to solve ATCG-PC problems by adding local search phase.Compared with SNBAr-DE, SNBAr-IGA, SNBAr-ABC and SNBAr-PSO, SNBAr-CSO is an effective method to solve ATCG-PC problems.

Comparison of Different Selection Strategies
Since schema, NBAr and scatter search are used in SNBAr-CSO, in order to discuss the influence of different strategies on the algorithm, we have carried out a strategy separation experiment, and the results are shown in Tab. 9.
In the comparison strategy contained in Tab. 9, scatter search is adopted by default when it is not specified.For example, CSO means CSO that only uses scatter search, S-CSO means CSO that includes schema but does not use scatter search, S-CSO means both schema and scatter search, SNBAr-CSO table means CSO that includes schema and node branch archive but does not include scatter search, SNBAr-CSO is the method we proposed in this paper.Based on the results in Tab. 9, we can see that SNBAr-CSO shows the best results among all the results.The combination of each strategy can improve the efficiency of the algorithm, and there is no conflict between the strategies.The scatter search strategy has a great influence on the algorithm, because compared with the quartile search, the scatter search can quickly find the best fitness individual in the search space of the target.
To sum up, different strategies used in SNBAr-CSO can be well combined, and each strategy solves different problems in the process of test case generation.

CONCLUSION
In this paper, we propose a competitive swarm optimizer with schema and node branch archive for the complex ATCG-PC problem with multiple variables in nodes.SNBAr-CSO combines the good exploration and search ability of CSO with schema and node branch archive (NBAr).A priori knowledge schema is helpful to quickly find out all variables affecting nodes.Node branch archive can record the relationship between nodes and variables.Schema and NBAr reduce a lot of search space when covering the target path, thus reducing the generation of a large number of redundant test cases.In order to prove the effectiveness of SNBAr-CSO, this paper studies its performance on 12 open source benchmark programs on two commonly used tool kits iFogSim and CoreNLP.It is compared with a variety of most advanced algorithms, some search-based algorithms and their combination with SNBAr variants.The experimental results prove the following conclusions.Firstly, the performance of SNBAr-CSO is better than GPE-IS, MISA, NBAr-DE and RP-DE.Second, SNBAr can well combine different search-based algorithms to solve ATCG-PC problems, and the combination of SNBAr and CSO is better than DE, IGA, ABC and PSO.Thirdly, there is no conflict between different strategies in SNBAr-CSO.They can be well integrated to improve the efficiency of generating test cases.
In the future work, we will combine SNBAr with better local search strategy to form more effective ATCG-PC solutions, and further explore more a priori knowledge in the field of ATCG-PC to enhance the solution ability of the algorithm.

Definition 1 (
control flow graph): Control flow graph (CFG) is a directed graph G = (N, E, s, e).It can concisely represent the path structure of the program under test.

Figure 1
Figure 1 Example of test program and its control flow graph

Table 2
Notation of basic variables in SNBAr Variable name Description L The total number of paths in the program under test X i The optimized test case S i

Table 3
Schema in example program

Table 4
Benchmark programs in iFogSim and CoreNLP Search space" means the size of the decision space of the benchmark function."Probability of the most difficult path" means the probability of covering the most difficult path with randomly generated test cases.

Table 5
Experimental parameter Setting j -low j )

Table 6
Experimental results of SNBAr-CSO and some other algorithms

Comparison with Other Search-Based Algorithms and Their Variants
In this section, we apply SNBAr to different searchbased algorithms, and verify the effectiveness of SNBAr strategy by comparing DE and SNBAr-DE, IGA and SNBAr-IGA, ABC and SNBAr-ABC, PSO and SNBAr-PSO.At the same time, in order to show that the combination of SNBAr and CSO has better results, we compare SNBAr-CSO with other variants based on searchbased algorithms.The results are shown in Tab. 7, Tab. 8 and Fig.3.As shown in Tab. 7, DE only achieves 100% path coverage on benchmark function No. 1, while it only covers part of the path on most other benchmark functions.This is due to the lack of coverage of target path in local search phase.It is difficult to find test case set satisfying path coverage in large search space only through searchbased algorithm.When SNBAr is combined with DE, SNBAr-DE achieves 100% path coverage on 11 benchmark functions, and the number of test cases generated is greatly reduced.Similarly, the combination of SNBAr with IGA, ABC and PSO also improves the efficiency of test case generation.

Table 8
Experimental results of SNBAr-CSO with ABC, SNBAr-ABC, PSO and SNBAr-PSO Figure 3 Box plots of the consumption of generated test cases between SNBAr-CSO with other search-based algorithms and their variants

Table 9
Experimental results of different selection strategies