Wrapper and Hybrid Feature Selection Methods Using Metaheuristic Algorithm for Chest X-Ray Images Classification: COVID-19 as a Case Study

: Covid-19 virus has led to a tremendous pandemic in more than 200 countries across the globe, leading to severe impacts on the lives and health of a large number of people globally. The emergence of Omicron (SARS-CoV-2), which is a coronavirus 2 variant, an acute respiratory syndrome which is highly mutated, has again caused social limitations around the world because of infectious and vaccine escape mutations. One of the most significant steps in the fight against covid-19 is to identify those who were infected with the virus as early as possible, to start their treatment and to minimize the risk of transmission. Detection of this disease from radiographic and radiological images is perhaps one of the quickest and most accessible methods of diagnosing patients. In this study, a computer aided system based on deep learning is proposed for rapid diagnosis of COVID-19 from chest x-ray images. First, a dataset of 5380 Chest x-ray images was collected from publicly available datasets. In the first step, the deep features of the images in the dataset are extracted by using the dataset pre-trained convolutional neural network (CNN) model. In the second step, Differential Evolution (DE), Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO) algorithms were used for feature selection in order to find the features that are effective for classification of these deep features. Finally, the features obtained in two stages, Decision Tree (DT), Naive Bayes (NB), support vector machine (SVM), k-Nearest Neighbours (k-NN) and Neural Network (NN) classifiers are used for binary, triple and quadruple classification. In order to measure the success of the models objectively, 10 folds cross validation was used. As a result, 1000 features were extracted with the SqueezeNet CNN model. In the binary, triple and quadruple classification process using these features, the SVM method was found to be the best classifier. The classification successes of the SVM model are 96.02%, 86.84% and 79.87%, respectively. The results obtained from the classification process with deep feature extraction were achieved by selecting the features in the proposed method in less time and with less features. While the performance achieved is very good, further analysis is required on a larger set of COVID-19 images to obtain higher estimates of accuracy.


INTRODUCTION
The new type of coronavirus (COVID- 19), which first appeared in Wuhan Province of China, is a respiratory virus that shows symptoms such as cough, fever and shortness of breath in some patients. The virus, which has had a tremendous impact on people's daily lives and health, has affected people in more than 200 countries around the world [1].
Simple CT and chest x-ray imaging play an important role in the diagnosis of SARS-CoV-2 pneumonia, detection of complications and follow-up of the disease. These imaging techniques are also important devices in determining the severity of the disease. Chest x-ray may remain of low sensitivity for detecting pulmonary leaks in infected individuals in their early stages and in mild survivors. However, it is widely used because it is accessible even in most small settlements, and its capacity to raise the suspicion of pneumonia makes this device valuable. Chest x-ray imaging plays an important role in monitoring the course of pulmonary lesions, even in patients who have a severe course of the disease and are hospitalized in the intensive care unit [2][3][4] In general, the diagnosis of COVID-19 can be made by a detailed image analysis of coronavirus symptoms, computed tomography (CT) or chest x-ray image. Otherwise, manual analysis will not be sufficient to distinguish between pneumonia and COVID-19 disease [5]. Radiological imaging is an important tool for diagnosing COVID-19. Most cases of COVID-19 have similar features on radiographic images in the early stage, particularly in the lower lobes, including bilateral, multifocal, ground-glass opacities with posterior or peripheral distribution, and pulmonary consolidation in the late stage [6].
Chest x-ray images are known to have potential in monitoring and examining various lung diseases such as infiltration, tuberculosis, pneumonia, atelectasis and hernia [7]. Chest x-ray imaging technique is considered one of the most powerful medical imaging techniques in hospital for detecting chest abnormalities. However, the biggest problem in using chest x-ray imaging technique is that it takes a long time for radiologists to read and interpret chest radiograph images. Considering the increase in the number of chest xrays taken from patients with the Covid19 pandemic, more workload falls on radiologists [8]. Lung inflammation called pulmonary opacification, which occurs in many of the respiratory tract diseases, is also seen in the new type of Corona virus, COVID-19. The formation of such opacities renders the lung regions unrecognizable, making automatic image analysis on chest x-ray images difficult [9].
The increase in this workload may cause the spread of Covid19, an infectious disease, the increase in the workload of hospitals, the increase in the degree of illness of the patients and even the loss of the patient. The spread of these situations can lead to consequences that can lead to health and economic problems in countries, victimization of their citizens and curfews. Therefore, rapid and accurate evaluation of chest x-rays is very important Especially with the emergence of highly mutated (SARS-CoV-2) variant Omicron (B.1.1.529), its contagiousness is much faster and it has caused worldwide panic because of its vaccine-escapable mutations [10].
Today, artificial intelligence methods, image processing techniques and deep learning methods are used in the detection of many diseases (e.g. stomach cancer detection, breast cancer detection, brain tumour detection, etc.) [11][12][13].
In their study, Aslan et al. (2021) proposed AlexNet and hybrid AlexNet + BiLSTM methods to automatically detect positive COVID-19 cases by using artificial neural networks on chest CT X-ray images [14].
Taspinar et al. They classified SVM, LR and ANN from a total of 3486 healthy, Covid-19 and viral pneumonia chest X-ray images and achieved the highest classification success of 96.7% [13].
Jaiswal et al. They performed transfer learning by finetuning the DenseNet201 model, one of the pre-trained CNN models using chest CT images, and proposed this model to diagnose COVID-19 positive or Covid-19 negative. They found 99.82%, 97.4% and 96.25% [16].
In this study, firstly, transfer learning was applied on the CNN pre-trained model on the publicly available Covid-19 Chest X-Ray Database, and deep features were extracted. Secondly, the obtained features were given to the NB, DT, SVM, k-NN and NN classifiers for binary, triple and quadruple classification, and their classification success was measured. Finally, features obtained from deep learning were selected with DE, ACO and PSO algorithms. The features obtained after feature selection were given to the SVM classifier for binary, triple and quadruple classification, and their classification success was measured and their success was compared. In order to measure the success of the models objectively, 10 folds cross validation was used. The main contributions of this study are listed below. 1) The proposed system adopts a new framework based on the diagnoses of COVID-19, Viral Pneumonia, and of

MATERIALS AND METHODS
In this study, firstly Chest x-ray images for deep feature extraction procedure, the SqueezeNet model was used. Machine learning (DT, NB, SVM, K-NN and NN) algorithms classifiers are used. Secondly, redundant and redundant features, optimization methods (DE, ACO and PSO) and feature selection process were performed among the features extracted in the first stage. By using the features obtained from the process, it was subjected to the classification process with the most successful classification method in the first step. Finally, classification successes were compared. Using the chest x-ray images suggested in Figure  1, deep learning-based and feature selection-based binary (Covid-19 or Normal), triple (Covid-19, Normal, Viral Pneumonia) and quadruple (Covid-19, Normal, Viral Pneumonia, Lung Opacifications) illustrates the classification process.

Data Set and Properties
X-ray images were used in this research. Images were downloaded from the public online Kaggle dataset Repository [12]. The data set consists of chest x-ray images approved by different institutions (COVID-19, Viral Pneumonia, Lung Opacifications and Health (Normal)). There are 1345 Covid-19, 1345 Viral Pneumonia, 1345 Lung Opacifications and 1345 Health (Normal) chest x-ray images in the dataset. Each chest x-ray image is 229×229 in size and consists of a total of 5380 images. Chest x-ray images of each class are given in Fig. 2. How the chest x-ray images used for binary classification, triple classification and quadruple classification were created in the study are given in the Tab. 1.

Deep Transfer Learning and Convolutional Neural Network (CNN)
It is a convolutional neural network (CNN), which is used in image recognition problems and is one of the most well-established algorithms of deep learning models [1]. Deep learning models are used in many medical applications such as classifying medical images, segmenting images, and detecting lesions from images. Medical imaging techniques such as x-ray, magnetic resonance imaging (MRI) and computed tomography (CT) are also used for signal processing and analysis of image data using CNN models. As a result of these analyses, it provides great convenience to doctors in studies such as the diagnosis of diseases such as stomach cancer, breast cancer, diabetes mellitus, skin cancer and brain tumours [17][18][19].
CNN consists of three layers: convolution, pooling and fully connected layer to effectively perform the process of learning from image data and testing it on new images. Feature extraction is performed in both the convolutional and pooling layers, and finally classification is performed in the fully connected layer. SqueezeNet, one of the pre-trained CNN models, was used in the study. Transfer learning enables the models to be transferred to the new model by fine-tuning the weight coefficients and parameters obtained on the previously trained data sets. Designing a new CNN model instead of transfer learning and training the model from scratch requires both very time-consuming, costly and high-performance equipment [20]. Therefore, with the transfer learning method, the researchers preferred to finetune the pre-designed system and transfer the information obtained by the model used on a large pre-trained dataset to the newly trained model with fewer samples. One of these models, the SqueezeNet model, was designed by training on the ImageNET dataset with a total of 1000 data classes [21].

Feature Selection
In feature selection, informative features are selected from the feature space such that the redundant and irrelevant deep features of the feature vector obtained from deep feature extraction are reduced and are effective in remote classes [22]. In this study, three features were selected from the feature space optimization algorithms (DE, ACO, PSO). Selected feature vectors obtained from all three optimization algorithms were classified by machine learning algorithms and two-class and multi-class classification methods.

Differential Evolution (DE)
The Differential Evolution (DE) algorithm is a population-based algorithm proposed by Price in 1995. It is promising in optimization problems with its advantages such as finding the true global minimum, using a small number of control parameters and fast convergence, independent of the initial parameter values. Important parameters include population size, scaling factor, and crossover constant. It is similar to the genetic algorithm with operators such as crossover, mutation and selection in its structure [23][24][25].

Ant Colony Optimization (ACO)
Ant Colony Optimization (ACO) is a meta-heuristic approach inspired by the pheromone trailing and tracking behaviors of some ant species [26]. It is an optimization algorithm inspired by the method used by colony-dwelling ants to find the shortest and right path between their nests and food in their search for food. Important studies have been carried out to find solutions with this optimization algorithm in solving problems such as discrete-continuous optimization problems, traveling salesman problem, load balancing and routing in telecommunication [27].

Particle Swarm Optimization (PSO)
Particle swarm optimization (PSO) is a heuristic computational technique developed by Kennedy and Eberhart in 1995 to solve problems and behavior of flying birds [28,29]. In the selection of important features that affect the PSO classification, it can contribute to the increase of classification accuracy and performance by reducing the workload of the classifier, since PSO has a powerful discovery and the ability to explore different parts of the different particle's solution space [29]. PSO, which has the memory of the particle swarm, is very useful in the feature selection process since all particles in the problem space hold the solution information for the solution of the problem [25].

Classification Algorithms
Classification problems are one of the most classical problems that researchers use for binary or multiclass decomposition of data in various fields [30,31]. There are studies in many fields such as agriculture, medicine, education and military. In this study, five machine learning methods NB, DT, SVM, k-NN and NN, binary (Healthy or Covid-19), triple (Viral Pneumonia, Health or Covid-19) and quadruple (Lung Opacifications) for binary and multiple classifier model were used. , Viral Pneumonia, Health or Covid-19) were used to perform diagnostic determinations. Below is a brief description of each of the algorithms.

Decision Tree (DT) Algorithm
Decision tree classification provides a fast and convenient solution to classify samples in large datasets containing a large number of variables. There are two key elements for constructing decision trees: (a) growing the tree to ensure it categorizes the training dataset correctly, and (b) the pruning phase where unnecessary nodes and branches are removed to improve classification accuracy [32].

Naive Bayes (NB) Algorithm
Naive Bayes algorithm is a probability-based classifier. For this, it calculates the probability set according to the relevance between the value frequency and combinations for binary and multiclass classification of the data in the dataset. The NB algorithm internally uses the polynomial model, Bernoulli or Gaussian model for training and testing [33,34].

Multi-Class Support Vector (mSVM) Algorithm
Support Vector Machines is the model proposed by Vapnik (Widodo & Yang, 2007) as a supervised statistical learning method based on the principle of inherent risk reduction. SVM is a method for finding hyperplanes between various training data classes for classification of test data by hyperplanes in a d-dimensional feature space [7].

k-Nearest Neighbors (KNN) Algorithm
A grouping approach that depends on the learning information closest to the item under consideration is the Knearest neighbors (k-NN) algorithm. The k-NN algorithm is a generalization algorithm that uses the class label of the kexample with the inductive offset for the nearest neighbor rules, the class label to be tested most similar to the closest one. It differs from the nearest neighbor in that it extends the nearest neighbor to k in the decision-making phase. This extension allows the k-NN algorithm to retrieve and use more information. Unlike other classification algorithms, they skip the learning process [35,36].

Neural Network (NN) Algorithm
Effective use of ANN as a powerful machine learningbased classification method is particularly useful because of its non-linear mapping capabilities. The ANN classifier has the ability to classify the weights and biases used by the neurons in the layers in their connections with each other, using artificial neurons connected to each other. ANN performance depends on factors such as the structure of the network, the activation and transfer function, and the number of hidden layers [37].

k-Fold Cross-Validation
By dividing the data set by k, one of them is selected as the test data set, while the others (k-1) are repeated k times to be used for training purposes, and all data are tested. As a result, k different accuracy is obtained from all the processed data. The variance of the estimation obtained from the crosscorrection process decreases as the k-folds value increases, and the k-value is generally used as 10 in studies. The purpose of this method is to prevent randomness in the prediction results. The disadvantage is the longer duration of the training algorithm. In this study, validation was performed as k = 10 [38]. Fig. 3 shows the 10 fold cross validation process.

Comparative Analysis
Confusion matrix was used to calculate the performance the complexity matrix was used to calculate the metrics of classification performances and to test the usability of the proposed methods. We obtain information about the real class and the predicted class numbers from the confusion matrix. Fig. 4 shows the confusion matrix for dual class confusion matrix and multi class classification. The Accuracy, Sensitivity, Specificity, Precision, F1-Score and Correlation Coefficient (CC) results of each method were calculated with the Complexity matrices we obtained as a result of the classification processes. These calculations are given in equations 1-6 for binary classification. ,

TP TN Accuracy TP TN FP FN
, , ,

EXPERIMENTAL WORK AND RESULTS
A laptop computer with 32 GB RAM (3200 MHz), Intel (R) Core (TM) i7-10750H CPU @ 2.60GHz and 500 GB NVM-2 SSD HDD was used for the present study. The program has a user-friendly interface designed in Matlab. Experiments are performed in two different scenarios to classify and detect COVID-19 using X-ray images. First, the CNN pre-trained model of X-ray images is trained separately to classify in two, three, and four categories using deep features obtained with SqueezeNet: Machine learning methods (NB, DT, SVM, k-NN, NN) classifier for classifying these feature vectors and (Healty or Covid-19), (Viral Pneumonia, Health or Covid-19) and (Lung Opacifications, Viral Pneumonia, Health or Covid-19) are used to train/test data with 10 fold cross validation for the classification process. Second, deep features were extracted from the CNN pre-trained model, SqueezeNet, and feature selection was performed using optimization techniques (DE, ACO and PSO) and a final feature vector was created. These last feature vectors were used to train/test the mSVM method, which was the most successful classification method in the first stage. In the classification, (Healthy or Covid-19), (Viral Pneumonia, Healthy or Covid-19) and (Lung Opacifications, Viral Pneumonia, Healthy or Covid-19) were classified separately. In order to measure the success of the models objectively, 10 folds cross validation was used.

Classification Evaluation without Feature Selection
Five prediction models were created using the training dataset for machine learning methods (NB, DT, SVM, k-NN, NN) classifiers. The models were constructed using all the features obtained from deep feature extraction of the Chest x-ray image dataset without any feature selection process.

Binary Classification
In the first phase, Health or Covid-19 x-ray image data was used for binary classification. Our images consist of 1345 healthy and 1345 COVID-19 images. Classification performances of the X-ray images in the dataset using the feature vector (2690x1000) obtained with the deep features extracted with the CNN pre-trained model SqueezeNet are given in Tab. 1.

Triple Classification
In the second phase, it was performed for triple classification among Covid-19, Normal, and Viral Pneumonia x-ray image datasets. It consists of 1345 Normal images, 1345 COVID-19 images, and 1345 Viral Pneumonia images. Classification performances of the X-ray images in the dataset using the feature vector (3035x1000) obtained with the deep features extracted with the CNN pre-trained model SqueezeNet are given in Tab. 2.

Quadruple Classification
In the third phase, it was performed for quadruple classification among X-ray image datasets of Covid-19, Normal, Viral Pneumonia, and Lung Opacifications. It consists of 1345 Normal images, 1345 COVID-19 images, 1345 Viral Pneumonia images and 1345 Lung Opacifications images. Classification performances of the X-ray images in the dataset using the feature vector (5380×1000) obtained with the deep features extracted with the CNN pre-trained model SqueezeNet are given in Tab. 3.

Classification Evaluation with Feature Selection
This section consists of two stages. In the first stage, feature selection was made with optimization algorithms (DE, ACO and PSO). The second step was to evaluate the performance of the models based on the features selected from all three optimization algorithms. Classification was made with the mSVM classifier method, which is the best classifier in classification processes, without feature selection in the evaluation process. The X-ray images in the dataset were extracted with the pre-trained CNN model, SqueezeNet, and a 770-featured vector was obtained with the DE feature selection process, a 478-featured vector with the PSO feature selection process, and a 554-featured vector with the ACO feature selection process.

Binary Classifications with Selected Attributes
The first experiment was for binary classification between Covid-19 or Normal x-ray image datasets. It consists of 1345 Normal images and 1345 COVID-19 images. Finally, feature vectors (2690×770), (2690×554) and (2690×478) were created for DE, ACO and PSO, respectively. These feature vectors were classified with the mSVM classifier and their classification performances were measured. The classification performances of the data formed with the selected features are given in Tab. 4.

Triple Classification
The second experiment was performed for triple classification among Covid-19, Normal, and Viral Pneumonia x-ray image datasets. It consists of 1345 Normal images, 1345 COVID-19 images, and 1345 Viral Pneumonia images. Feature vectors (4035×770), (4035×554) and (4035×478) were created for DE, ACO and PSO, respectively. These feature vectors were classified with the mSVM classifier and their classification performances were measured. The classification performances of the data formed with the selected features are given in Tab. 5.

Quadruple Classification
The third experiment was performed for quadruple classification among the Covid-19, Normal, Viral Pneumonia, and Lung Opacifications x-ray image datasets. It consists of 1345 Normal images, 1345 COVID-19 images, 1345 Viral Pneumonia images, and 1345 Lung Opacifications images. Feature vectors (5380×770), (5380×554) and (5380×478) were created for DE, ACO and PSO, respectively. This feature was classified with the mSVM classifier and its classification performances were measured. The classification performances of the data formed with the selected features are given in Tab. 6. Fig. 3 shows the radar graph of Accuracy, Sensitivity, Specificity, Precision, F1-Score and Correlation coefficient properties according to the selected features.

Comparative Analysis Based on Accuracy Measures of Each Classifier
In this section, the graphs of the accuracy values of the binary, triple and quadruple classification results of the selected features by using the classification operations and optimization algorithms using the features obtained by deep feature extraction are given in Figs. 6-8.  The classification of the features obtained by extracting the deep features and the confusion matrices of the 2 nd , 3 rd and 4 th class best performance are given Fig. 9.
The confusion matrices of the performance of the best DE algorithm as a result of the classification of the features obtained by the extraction of deep features and the features obtained by the feature selection of the optimization algorithms classified as 2, 3 and 4 classes are given Fig. 10.

DISCUSSION
When the recent literature studies on X-ray images in the diagnosis of COVID-19 are examined, it is seen that deep CNN is one of the most preferred techniques [13]. While some of these studies have dual classifications, some have multiple classifications. When the studies are examined, it is seen that there are different numbers of chest X-rays and different numbers of classification. Tab. 8 contains the latest studies on chest X-ray in the literature.

CONCLUSION
Early and rapid diagnosis of Covid-19 is very important to prevent the spread of the disease to other people and to prevent the spread of the pandemic. Separating Covid-19 and respiratory diseases will also eliminate the anxiety that may occur in people.
In this study, binary, triple and quadruple computerassisted diagnosis was made using deep CNN approach from chest x-rays (COVID-19, Normal), (COVID-19, Viral Pneumonia, Normal) and (COVID-19, Viral Pneumonia, Lung Opacifications, Normal). More specifically, deep features were extracted with pretrained Deep CNN, and machine learning techniques were used as classifiers for these extracted features classification. Afterwards, these features were selected with optimization algorithms and fewer feature vectors were obtained, and their performance was measured by classifying them with mSVM, and the following results were obtained: 1) The mSVM method stands out as the best classifier in the binary, triple and quadruple classification process using 1000 features obtained by feature extraction with deep CNN learning. In binary, triple and quadruple classification, accuracy values of 96.02%, 86.84% and 79.87% were calculated, respectively. 2) The features obtained by deep learning were selected using metaheuristic optimization algorithms DE, ACO and PSO. 1000 attributes were reduced to 770 with DE, 554 with ACO, and 478 with PSO. 3) These features obtained in two stages were classified separately with the mSVM classifier. Very good results were obtained with the best accuracy DE + mSVM with 95.83, 86.12 and 79.67, respectively. 4) The classification performance obtained from the classification process with deep feature extraction was achieved by using feature selection in the proposed method, by using less features and by spending less time for training time. Classification of features obtained from feature selection with DE, ACO and PSO was performed with 31.5%, 72.6% and 44.9% less training time, respectively, compared to classification using all features.
5) DE feature selection algorithm and mSVM classifier produced more successful results in deep feature extraction.