Intelligent Automation System for Vessels Recognition: Comparison of SIFT and SURF Methods

: Nowadays, with the rise of drone and satellite technology, there is a possibility for its application in sea and coastal surveillance. An advantage of this type of application is the automated recognition of marine objects, among which the most important are vessels. This paper presents the principle of vessel recognition based on the extraction of satellite image features of the vessel and the application of a multilayer perceptron (MLP). Dataset used in this research contains the total of 2750 images, where 2112 images are used as training set while the remaining 638 images are used for testing purposes. The SIFT and SURF algorithms were used to extract image features, which were later used as the input vector for MLP.The best results are achieved if a model with four hidden layers is used. These layers are constructed with 32, 128, 32, 128 neurons with ReLU activation function, respectively. Regarding the application of feature extraction, it can be observed that better results are achieved if the SIFT algorithm is used. The ROC AUC value achieved with the combination of SIFT and MLP reaches 0.99.


INTRODUCTION
Nowadays, satellite imagery is used for a variety of applications such as responding to law enforcement, spotting various natural disasters, monitoring the environment, etc. Such applications often require manual identification of individual objects; however, in this research, objects will be identified using artificial intelligence algorithms [1]. Recognition of objects from aerial imagery has its significance in a wide range of activities related to maritime affairs and ranges from military use [2], through the organization of maritime transport and fishing, to ecology and wildlife survey [3]. Due to the fact that most of today's world fleet is powered by fossil fuels [4,5], aerial surveillance of ships represents progress in reducing pollution. In addition, aerial surveillance can help protect endangered animal species from human impact [6].
Automation in the aforementioned applications is necessary, as the geographical space is large and the number of analysts available to conduct searches is small.
By utilizing AI algorithms, aforementioned tasks can be automated with high classification accuracy. Furthermore, these algorithms have been proven successful in various fields such as robotics [7], medicine [8], energy production [9] and maritime [10] Partovi et al. (2017) show AI method, pre-trained CNN, for roof type classification based on high-resolution satellite imagery. They achieve relatively high accuracy and decrease the computation time for training [11]. Khan et al. (2017) demonstrate the use of deep learning to automatically detect a target in satellite imagery. EdgeBoxes and CNNs are used for target and non-target objects classification in order to achieve optimal results [12]. Duarte et al. (2018) demonstrate the AI methods to perform satellite image classification of building damages and achieve high-quality results [13].
In this research, AI methods will be used for vessels recognition in Google satellite images. AI methods have already been used in other studies for vessels classification purposes. Leclerc et al. (2018) show the use of pre-trained CNNs to perform maritime vessel image classification on a limited image dataset [14]. Gallego et al. (2018) show the use of CNN for automatic ship classification from optical aerial images and obtained satisfactory results [15]. Gurgen et al. (2018) demonstrate the use of an artificial neural network model in order to anticipate the details of a chemical tanker in the preliminary design phase [16].
The dataset used in this study originated from [17], where it was used to train convolutional neural networks.
The aim of this research is to integrate Scale-Invariant Feature Transform (SIFT) and Speed-up Robust Features (SURF) with MLP in order to classify four types of vessels from [17]. First, SIFT and SURF algorithms will be applied to the original dataset in order to extract important features. Second, the obtained descriptors will be used as input data in MLP model. Afterwards, the result of each model will be compared.

MATERIALS AND METHODS
This section is divided into Dataset description and Methods description. The Dataset description gives a short description of parameters used for vessels recognition. The Methods description provides a brief overview and mathematical description of Artificial Neural Network and evaluation metric used in this research.

Dataset Description
The dataset used in this research was collected from Google Images, more specifically by taking screenshots of Google satellite image of the area of interest. Collected images are divided into four classes: "Boats", "Cargo ships", "Cruises" and "War ships". These images are then used to create the 2750 image dataset, where the training set consists of 2112 images while the testing set consists of 638 images. The dataset Fig. 1 shows sample images of all four types of vessels.
The detailed description of data curation process is presented in [17]. Information about size of the dataset is given in Tab. 1, where the number of members is presented for each class. This information is presented separately for training and testing sets. Technical Gazette 28, 4(2021), 1221-1226

Methods Description
Multilayer Perceptron will be used as a classification technique for vessels recognition, while Scale-Invariant Feature Transform and Speed-up Robust Features will be used for feature extraction. The overview of the modeling process is given in Fig. 2.

SIFT
Scale-Invariant Feature Transform algorithm (SIFT) was developed in 1999 by Canadian computer engineer David G. Lowe [18]. The algorithm was improved and rereleased in 2004. It is used to perform computer vision tasks, i.e. to find and describe features of an image. It is based on the extraction of local features that are invariant to rotation, scaling, and partly to variations in illumination and repositioning of the camera. Four basic steps used to generate a feature set are [19]: The first stage of discovering key points is to identify locations and scales that can be assigned with different views of the same object. Detecting locations that are invariant to resizing of an image can be achieved by seeking stable features across all possible scales, using a continuous scale function known as scale space. Each point of interest (feature) must be assigned a descriptive vector. A descriptive vector (descriptor) indicates the environment of that feature. In order to determine the descriptive vector, it is necessary to calculate the amplitude and angle of the gradient for each element within the environment of the observed point of interest. The descriptive vector contains 4x4x8 data for each feature and needs to be normalized. Higher values of the gradient amplitude are renormalized to unit values, and a descriptive vector is obtained which compares the corresponding points between the images [20]. Six steps of SIFT algorithm used to generate a feature set are shown in Fig. 3.

SURF
Speeded-Up Robust Features (SURF) is a speeded-up version of SIFT algorithm. For faster calculation time instead of Gaussian averaging the image, this algorithm uses a simple approximation of Hessian matrix and integral image [21]. Two main steps are used to generate a feature set:  Feature Extraction,  Feature Description.

Figure 4 Flowchart of SURF algorithm
In the first stage, algorithm approximates the Laplacian of Gaussian (LoG) with box filters. For scale and location, it relies on determinant of the Hessian matrix [22]. In the next stage, it uses wavelet responses in both horizontal and vertical directions for orientation assignment and also for feature description. For underlying interest point, it uses the sign of Laplacian. Since its value has already been calculated it adds no calculation costs. SURF results in a descriptor vector for all sub-regions of length 64, while in SIFT algorithm descriptor is the 128-D vector. This is part of the reason that SURF algorithm is 3 times faster than SIFT [23]. The resulting descriptor is invariant to scale, rotation, contrast and partly to other transformations. Six basic steps of SURF algorithm used to generate a feature set are shown in Fig. 4.

MLP
Multilayer perceptron (MLP) consists of interconnected nodes, i.e. neurons connected to network in a way that the output from an individual neuron represents the input to one or more adjacent neurons. Layer of neurons is connected by weights and multiplying the input signal with the weight value, the output value of the neuron is transmitted to the next layer. As described above, the neuron calculates the sum of the inputs that depend on the parameters w and b. The output value can be calculated as follows [24]: where w represents weights, b represents bias and φ stands for activation function. The activation function is a mathematical "gate" between the input that feeds the current neuron and its output, which passes into the next layer. Some of the activation functions are Linear, Sigmoid, Hyperbolic Tangent (Tanh) and Rectified Linear Unit (ReLU) [25]. Grid search algorithm has been used to determine optimal hyperparameters of the MLP [26]. The hyperparameters adjusted in this research are the number of hidden layers and neurons, activation function, learning rate, learning rate decay and optimizer. Subset of hyperparameter space is shown in Tab. 2.

Performance Evaluation
The Area under the Receiver Operating Characteristic Curve (ROC AUC) is used as performance measure in this research. The AUC curve shows the relationship between the False Positive Rate on the x-axis and the True Positive Rate on the y-axis. Terms used to define the AUC and the ROC curve are defined as [27]: where TPR is true positive rate (sensitivity), TNR is true negative rate (specificity) and FPR is fall-out or false positive rate. Since ROC by its design is used for binary classification, using it for multiclass classification is not typical. In order to extend the ROC curve and ROC area to multiclass classification, it is necessary to conceptualize the problem as a binary classification problem in a way that one class is classified against all other classes. Macro averaging reduces multiclass prediction to multiple binary predictions sets; it calculates the matching metric for each binary case and averages results together. Macro average for k classes can be calculated as follows [

RESULTS AND DISCUSSION
The first step is to extract features using the SIFT and SURF algorithm. To determine correspondence points, features are extracted from a set of reference images and stored in a database. Each feature of the new image is individually compared to the data in the database and the base image is sought. Using the descriptors obtained by the SIFT and SURF algorithm, MLP for vessels classification in four classes is trained and tested.
From Fig. 5 and Fig. 6, it can be seen that micro and macro average of ROC for all classes is higher than 0.96. MLP architecture that provides the highest value of performance measure using both SIFT and SURF algorithm is presented in Tab. 3.
MLP architecture consists of four hidden fullyconnected layers. First and third layer contain 32 hidden neurons while second and fourth contain 128 hidden neurons. Furthermore, ReLU is utilized in all of the aforementioned layers as activation function along with Softmax activation function in the output layer. The best results are achieved if Adam is used as optimization algorithm with learning rate of 0.001 and learning rate decay of 1e−7.   Fig. 7 it can be seen that maximal ROC AUC value for boats classification is 0.99. This value is achieved using SIFT algorithm and MLP with ReLU activation function. If SURF algorithm along with MLP designed with Sigmoid activation function are utilized, the maximal ROC AUC value of 0.98 will be achieved.

Figure 8 Comparison of ROC AUC of SIFT and SURF algorithm for cargo ships classification
Unlike the other vessels, for cruise ships the highest ROC AUC value of 0.97 will be achieved using SURF algorithm and MLP designed with ReLU activation function. If MLP was trained and tested using features obtained with SIFT algorithm, maximal ROC AUC value will be 0.96, as shown in Fig. 9. If ROC AUC value of war ships recognition is observed, it can be concluded that maximal value of 0.97 will be achieved using SIFT algorithm and MLP designed with ReLU activation function, as shown in Fig. 10. Using the SURF algorithm and the same MLP architecture maximal ROC AUC value of 0.96 will be achieved. Dunnmon et al. (2019) show the impact of dataset size on classification performance for Automated Classification of Chest Radiographs [29]. Furthermore, it can be concluded that a larger number of images in training set can improve overall model validation performance and achieve robustness. On the other hand, smaller training sets usually do not contain enough information for robust classification, especially when dealing with more than two classes. The risk of overfitting is relatively high since data diversity is not ensured. Moreover, it can be expected that the classification performance will degrade as the number of images in dataset decreases.
In this research calculation time of both SIFT and SURF algorithms integrated with MLP is compared. Obtained results show that SURF algorithm outperforms SIFT in terms of computational time. In Tab. 4 computational time of feature extraction algorithms is shown. Fischer et al. (2014) demonstrate that SIFT algorithm can achieve faster calculation time than CNN based algorithm in terms of feature computation [30].

CONCLUSION
In this research, an intelligent automation system for vessels recognition is presented. From obtained results, it can be concluded that Multilayer perceptron integrated with Scale-Invariant Feature Transform and Speed-up Robust Features is an appropriate method for vessel recognition from satellite images. Using optimized MLP architecture ROC AUC values higher than 0.9 are achieved. The SIFT + MLP system proved to be more successful than the SURF + MLP with the highest ROC AUC value of 0.99, except in the case of cruise ships recognition where SURF + MLP outperforms the other system with a value of 0.97. Moreover, SURF + MLP achieved faster computational time than SIFT + MLP system. For the future work, the plan is to test the aforementioned methods, along with ORB and BRIEF on larger dataset as well as integrate those algorithms with other AI classification algorithms, such as Support Vector Machine (SVM), Naive Bayes and Decision Trees.