A Binomial Crossover Based Artificial Bee Colony Algorithm for Cryptanalysis of Polyalphabetic Cipher

: Cryptography is one of the common approaches to secure private data and cryptanalysis involves breaking down a coded cipher text without having the key. Cryptanalysis by brute force cannot be accepted as an effective approach and hence, metaheuristic algorithms performing systematic search can be applied to derive the optimal key. In this study, our aim is to examine the overall suitability of Artificial Bee Colony algorithm in the cryptanalysis of polyalphabetic cipher. For this purpose, using a number of different key lengths in both English and Turkish languages, basic Artificial Bee Colony algorithm (ABC) is applied in the cryptanalysis of Vigenere cipher. In order to improve the ABC algorithm's convergence speed, a modified binomial crossover based Artificial Bee Colony algorithm (BCABC) is proposed by introducing a binomial crossoverbased phase after employed bee phase for a precise search of global optimal solution. Different keys in various sizes, various cipher texts in both English and Turkish languages are used in the experiments. It is shown that optimal cryptanalysis keys produced by BCABC are notably competitive and better than those produced by basic ABC for Vigenere cipher analysis.


INTRODUCTION
Secure communication ensures unauthorised individuals not to be able to gain access to private data. This security is delivered through cryptography which generates an encrypted form of a given text with a key and facilitates confidentiality, message availability and integrity. Cryptanalysis deals with retrieving plain text and/or key from a cipher text, without the permission of the communicating parties or knowledge on the key [1].
Classical ciphers are categorized into substitution ciphers in which letters undergo systematic replacement across the course of the message for other letter and transposition ciphers in which the original text is broken down into blocks with a previously outlined size, depending on the permutation. Classical ciphers remain valuable and important owing to the fact that the majority of the widely utilised modern ciphers make use of classical ciphers in their basis. In actuality, the majority of the complex algorithms are created through combining transposition and substitution ciphers [2,3]. Depending on the key, a cryptosystem may be divided into two systems: symmetric and asymmetric cryptosystems [2,3]. Symmetric cryptosystems may be divided into two subgroups as monoalphabetic in which each character is substituted for another with a fixed substitution of another alphabet letter [2] and polyalphabetic cryptosystems in which an alphabet letter can be replaced without a fixed structure to a different letter depending on its plaintext position [3].The possible set of keys being the set of all potential alphabet permutations in the space of 26 letters by Monoalphabetic Substitution Ciphers equals more than 403 septillions which takes 12 trillion years to check all potential keys at a rate of a million keys per second [4]. Vigenere Cipher is a polyalphabetic cipher which encodes the text by replacing a plain-text character with another letter with the use of alphabet rows in a table, which are changed in line with the code word letters' indices [5]. In message encryption, the plain text and the key are formulated as integer sequences and divided into groups depending on key length. Let P = (P 1 , P 2 , P 3 , ..., P n ) a plain text block, K = (K l , K 2 , K 3 , ..., K n ) key; and cipher text block C = (C 1 , C 2 , C 3 , ..., C n ). Eq. (1) is used for encryption and Eq. (2) is used for decryption. 26 (1) In Vigenere cipher, with regard to a key of size m, the key space is in the size of 26 m for English alphabet [6]. In this case, cryptanalysis by brute force is not practical due to its computational cost and stochastic optimization methods performing a systematic search based on randomization and previous experience are directed towards identifying the optimal key of classical cipher. In this study, one of the successful optimization algorithms in numerical optimization, Artificial Bee Colony algorithm (ABC) [7] was applied in cryptanalysis of polyalphabetic Vigenere cipher. The ABC algorithm in the class of swarm intelligence simulates the foraging behaviour of honey bees. It gained a success on many problems in various research fields [8]. ABC algorithm is a powerful search algorithm which makes it a potentially good candidate in finding cipher key. ABC algorithm has been applied to substitution cipher successfully [9,10].This encourages further work into ABC application in Vigenere cipher cryptanalysis and this is the first time ABC algorithm is applied to cryptanalysis problem. In order to enhance convergence speed, ABC algorithm was modified by employing a binomial crossover phase between employed bee and onlooker bee phases. The binomial crossover produces the trial elements randomly by taking elements either from the mutation vector or from the exist elements, as described in Eq. (3).
if or where B refers to the mixing results of the exist element, x i , and that produced by mutation, y i , R j refers to a random value produced randomly for each j in range of (0, 1), CR crossover rate which is randomly selected in range of (0, 1), and k is a randomly chosen from {1, …, n}, to include the whole elements from the mutation vector. The proposed approach is referred as Binomial Crossover based Artificial Bee Colony algorithm (BCABC) which integrates the advantage of ABC in exploration phase and a crossover operator in exploiting the information of neighbours. In the first part of this study, the parameter sensitivities of the basic ABC algorithm and BCABC algorithm are investigated and some values for the control parameters are recommended. By the best parameter configurations, the results produced by two algorithms are presented and compared based on statistical tools. The paper is organized as follows: In Section 2, related works on cryptanalysis using stochastic algorithms are presented, in Section 3, a brief description of ABC algorithm is explained, and the details of the proposed algorithm are given in Section 4. In Section 5, the experiments are explained, and the results are discussed. Finally, the last section is dedicated to the conclusion.

RELATED WORK
Some nature-inspired stochastic algorithms have been utilised in the cryptanalysis of classical cryptosystems. Spillman et al. [11] implemented a Genetic Algorithm (GA) to attack a Monoalphabetic Substitution Cipher. Furthermore, a GA system was adopted by Matthews [12], known as GENALYST, to break the transposition cipher. Clark [13] applied GA, Tabu Search (TS) and Simulated Annealing (SA) to cryptanalysis of substitution cipher. Moreover, Clark et al. [14] applied GA to attack on a polyalphabetic substitution cipher. Clark and Dawson [15] enhanced the work by integrating parallelism. Moreover, Clark and Dawson [16] presented a comparison among SA, GA and TS on the cryptanalysis of simple substitution ciphers. As a result, Dimovski and Gligoroski [17] highlighted three optimisation heuristics that could be applied in order to achieve transposition cipher breaking, namely SA, GA and TS. In the study of Verma et al. [18], a mono-alphabetic substitution cipher based on GA and TS for cryptanalysis was presented and subsequently compared the overall efficiency of such algorithms. An automated approach based on GA, TS and SA for the cryptanalysis of transposition cipher was developed by Song et al. [19] and Garg [20]. In addition, Omran et al. [21] developed a GA with the aim of attacking the Vigenere Cipher. Moreover, Bhateja and Kumar [22] devised an approach for the cryptanalysis of Vigenere cipher through GAs adopting elitism with a novel fitness function. In this regard, Boryczka and Dworak [23] considered the way in which evolutionary algorithms, including GAs, may be directed towards increasing the speed with which the cryptanalysis process of the transposition cipher can be increased. Further, Boryczka and Dworak [24] discussed the way in which evolutionary algorithms, including GAs, may be able to achieve the optimisation of the more complex cryptanalysis function. In addition, Uddin & Youssef [25] applied ACO in order to attack simple substitution ciphers. In the work carried out by Bhateja et al. [26], the suitability of the Cuckoo search algorithm (CS) in the cryptanalysis of the Vigenere cipher was investigated, whilst Luthra and Pal [27] directed their efforts towards examining the integration of mutation and crossover with the FA for cryptanalysis of the Monoalphabetic cipher. In the study of Li et al. [28] which focused on presenting a hybrid of TS and GA, the aim was to break classical and modern ciphers, with the findings emphasising that the hybrid algorithms were most influential in regard to the transposition cipher, with a minimal effect witnessed on the hill cipher, and the Advanced Encryption Standard (AES) cipher demonstrating the lowest level of effect across all three algorithms. Ali and Mahmod [29] introduce a hybrid technique of Bees Algorithm (BA) SA for the cryptanalysis of simple substitution cipher. In the work which is carried out by Din et al. [30], they introduced some of the theoretical parts of stochastic algorithms and provided detailed cryptanalysis problems such as Stream ciphers and Block ciphers. In addition, Civicioglu and Besdok [31] analyzed statistically the numerical optimization problem successes of the CS, PSO, DE and ABC algorithms. Most of the techniques in the literature are unable to produce satisfactory results in the analysis of the Vigenere cipher when the key size exceeds 15 characters.

ARTIFICIAL BEE COLONY ALGORITHM
The Artificial Bee Colony algorithm defined by Karaboga [7] simulates the foraging behaviour of honey bees. In a bee colony, there are three different groups of bees, namely employed, onlooker and scout. Each employed bee exploits the food source in her memory and returns to the hive where their information is shared through communicative dancing whose frequency is related to the food source quality. The onlooker bees observe the dances, gain information, and accordingly decide which food source to fly and exploit. This type of communication allows more onlooker bees to be attracted by high quality sources. Importantly, if the nectar of a source is exhausted, the source is then abandoned and its employed bee becomes a scout. The scouts are not given any guidance when seeking out food; rather, they complete their own searches, exploring new areas to find new sources.

BINOMIAL CROSSOVER BASED COMBINATORIAL ARTIFICIAL BEE COLONY ALGORITHM (BCABC)
In the collective knowledge leading to swarm intelligence, the most critical aspect is social learning. In the case of the ABC algorithm, this is predominantly achieved by the unsupervised interaction among bees in the employed and the onlooker bees phases. In basic ABC algorithm, an information exchange occurs in each local search by changing one randomly chosen dimension based on the information coming from a random neighbour. In some cases, when the information exchange remains limited to one dimension, it may slow down the algorithm's convergence. In the proposed model, we integrated a binomial crossover phase after employed bee phase is completed to enhance local search ability of basic ABC algorithm. There are some studies in the literature in which a binomial crossover was used: Zaharie [32] analyzed a theoretical and a numerical point of the crossover variant on the behavior of the DE. In addition, Weber and Neri [33] compared the binomial crossover used in the DE with a variant binomial crossover, the results showed that this variant of the binomial crossover increases significantly the execution speed of the DE, especially in higher dimension problems. In the work carried out by Islam et al. [34], they proposed a new mutation strategy for the binomial crossover of the DE. Moreover, Lin et al. [35] presented theoretical analysis and comparative study for two new crossover methods, binomial crossover and exponential crossover.
Moreover, there are some studies in the literature in which ABC algorithm was integrated with crossover operators [36][37][38][39][40]. Some of them used crossover operator placed in the employed bee, onlooker bee or scout bee phase. Others used the crossover operator, with other improvements in the structure of the original ABC. Others used it with mutation operator as a separate phase. And others used real coded crossover operator as a separate phase after employed bee. Unlike previous studies, this study is characterized by performing binomial crossover as a separate phase between employed and onlooker bee phases. The binomial crossover is given below: The pseudo code of Binomial crossover.
 Input crossover rate (CR);  Select two parent in same length (x, y); As shown in Fig. 1, two parents X i and Y i are selected, then a random uniform number U is created in the range of (0, 1). If (U > CR) "1" that means successful experiment, but "0" means unsuccessful experiment (U ≤ CR). The child C is taking an elements C, M, U and T from X i , and I, O, A and L from Y i . Repeat (for all food source/solutions) a. Employed Bee Phase b. Binomial Crossover Phase c. Onlooker Bee Phase d. Scout Bee Phase Until (the satisfaction of the requirements is completed).

EXPERIMENTS
In order to adapt the algorithms to cryptanalysis of Vigenere Cipher, first cipher text, key size and relative character frequencies are given to the algorithm. The algorithm generates alternative decryption keys which maximizes the cost function defined based on the relative frequencies specific to the language used in texts. Each food source which is assumed to be a key is initialized by random selection by Eq. (4), with a random uniform sample from integers 0 (lower bound) to 25 (upper bound) in case English language is used. In order to establish viability of the key, each solution is evaluated in the cost function defined using frequency analysis in which the aim is making a comparison between frequencies in the decrypted text with those identified in target literature of the text (English, Turkish, etc.). The difference between the most frequent number of bigram and unigram of the decrypted text and that of the corresponding bigram and unigram frequencies in normal language used (English or Turkish) is used as fitness function. In the present work, a fitness function in Eq. (7) [22,26] has been applied which depends on the unigram and bigram statistics of the language used.
where K is the key applied, OFM(i) and EFM(i) are identified and the expected frequencies of each unigrams while OFB(i) and EFB(i) are identified and expected frequencies of each bigram. λ 1 and λ 2 are weights assigned to unigram and bigram statistics, respectively. We adopted the values λ 1 = 0.23 and λ 2 = 0.77 suggested as the best values in [41].
The expected values of unigram and bigrams in English were derived from approximately 4.5 billion characters of English text [42] and given in Appendix Tab. A1 to Tab. A2. Similarly, those in Turkish were taken from the work of [43] and given in Appendix Tab. A3 to Tab.   Fig. 4 and Fig. 5 show the best and mean fitness values obtained with different limit parameters, different cipher sizes and various cipher text lengths. It is seen that good results are obtained when the limit parameter is within (0.5 × SN × D -1 × SN × D) for both ABC and BCABC algorithms. Within these ranges, we observe that the best results are obtained with the control parameter values in Tab. 2. Fig. 6 and Fig. 7 show the best and mean fitness values obtained with different CR parameters, different cipher sizes and various cipher text lengths. It is seen that good results are obtained when the CR is 0.2 for BCABC algorithms. Within these ranges, we observe that the best results are obtained with the control parameter values in Tab. 2.

Experiment 2: Comparison of ABC and BCABC Algorithms
In the second part of the experiments, the results of the algorithms run with the control parameter values in Tab. 2 for the key sizes (D) {5, 10, 15, 20, 25} and Vigenere cipher text sizes {250, 500, 750, 1000} each from the English and Turkish are extracted and compared to each other. Tab. 3 and Tab. 4 present the results of ABC algorithm in case English and Turkish alphabets, respectively and Tab. 5 and Tab. 6 present the results of BCABC algorithm for English and Turkish alphabets, respectively. from Tab. 3 to Tab. 6 reports the statistics of the number of key letters recovered correctly and fitness value statistics according to different cipher text lengths and key sizes. When the cipher text size equals 250 characters, the minimum and maximum number of key letters recovered correctly, is less than the minimum and maximum number of key letters recovered correctly in solving a cipher text of size 500, 750, 1000. As the cipher text length gets longer, the number of correct key letters increases as well because the approximation to expected values is more precise and the overall reliability in fitness is higher. Hence, when the cipher text is short (≤ 250 character) or key length is high, it requires a higher number of cycles. Besides, the results obtained using Turkish alphabet are better than those obtained using English alphabet when the cipher text length was small. Tab. 7 provides a comparison of ABC and BCABC algorithm based on the statistics of the number of key letters recovered correctly while Tab. 8 gives a comparison of the algorithms based on the statistics of fitness values for both English and Turkish cases.  The best values obtained with different limit parameter using different lengths of cipher text using ABC and BCABC algorithms Technical Gazette 27, 6(2020), 1825-1835 Figure 5 The mean values obtained with different limit parameter using different lengths of cipher text using ABC and BCABC algorithms Figure 6 The best values obtained with different CR parameter using different lengths of cipher text using BCABC algorithm Figure 7 The mean values obtained with different CR parameter using different lengths of cipher text using BCABC algorithm   From Tab. 7 and Tab. 8, ABC algorithm is efficient when Vigenere cipher uses a smaller length of key while when dealing with longer key lengths, it is not so efficient. On the other hand, BCABC is more efficient than ABC, specifically when the key size is up to 25, with cipher text size greater than 250 characters. Through some instances, the number of key letters recovered correctly, or the maximum fitness value was equal; nonetheless, we performed a statistical analysis to see whether these differences are significant. Non-parametric Wilcoxon rank sum test were applied to check whether the algorithms have equal median values since the runs failed in normality tests. BCABC algorithm was selected as control algorithm and the p-values and h-values are calculated for all instances as in Tab. 9. For English texts, BCABC algorithm was found better in 19 cases and worse in 1 case while for Turkish texts, BCABC algorithm was better in all cases.

CONCLUSION
Because the Vigenere cryptosystem key space is highly dimensional, neither brute force is practical in real time cryptosystems nor statistical approaches are useful when the key size is large. The objective of this paper was to analyse the overall applicability of ABC algorithm and propose a more efficient BCABC algorithm as a cryptanalysis method of traditional crypto systems focusing on Vigenere cipher.
The results are examined using statistical tools and it can be concluded that ABC is successful in recovering the whole key of Vigenere Cipher for keys of small size while BCABC has the ability to recover all of the letters for keys up to 25 letters with cipher text of more than 250 characters. This shows that the binomial crossover integrated to ABC algorithm improved the exploitation capability of the algorithm and BCABC is superior in terms of the accuracy in cryptanalysis of Vigenere cipher compared to ABC algorithm. With regard to the frequency based fitness function, it is not efficient in short messages but is better applicable to longer messages. However, it can be said that BCABC increases the time complexity compared to basic ABC algorithm.
Tailoring new fitness functions efficient for every different key size, text length, and language used is our future work. 0