Application of Statistical Indicators for Digital Image Analysis and Segmentation in Sorting of Agriculture Products

Food processing industry is moving forward to a full automation of all processes, especially in technological line segments which represent critical control points of food safety. One of these points is color sorting by using machine vision, where inappropriate products are removed. Most important product appearance attributes are color and texture. During food processing, the product is captured by optical devices, mostly color cameras and lasers. The aim of this paper is to investigate new eligibility criteria for digital image segmentation by using only image from the camera. The goal is to describe the texture of the product, based on chosen mathematical measures, and to allow for recognition and then classification according to the predefined range of values in an appropriate class. Images of frozen raspberry were used. Image analysis of color parameters in RGB color space and statistical tests to examine normality of data were carried out. Thereafter, one-way Anova and correlation analysis was performed. Statistically significant difference was found for the values of two indicators: entropy and new criteria were derived from standard deviation, as well as mean values of pixels for every channel, and marked as L. After determining the range of these criteria, a new algorithm was developed for image segmentation written in Matlab. One of the results of applying this algorithm is that more than 80% of good products were recognized.


INTRODUCTION
Head of accurate grading and sorting of fruits and food products arises because of increased expectations in quality food and safety standards.External quality is the most important and direct sensory quality attribute of agricultural products.In general terms, the external quality of fruits and vegetables is evaluated by considering their color, texture, size, shape and visual defects [1][2][3].Several review papers have been published, some of them are only focused on traditional computer vision system in external quality inspection [4][5][6][7][8], while others are focused on NIR, hyperspectral and multispectral imaging in quality evaluation and use different methods and color models [9][10][11][12].
The cameras detect defects based on color, while the lasers detect insects and animal parts as well as sticks, rocks, cardboard, plastic, metal, and glass, even if they are the same color as the good product, based on the object's structural properties.Some sorters rely on cameras, others on lasers, and some combine cameras and lasers to view product from the top only or both top and bottom.Some sorters inspect only an object's color, others inspect an object's color, size, and shape.There are some sortings based on the object's structural properties, including differing levels of chlorophyll.
As color attribute, texture is another significant sensory quality attribute that has been frequently used in the external quality inspecting and grading systems for the agricultural product quality evaluation.Texture analysis can also play an important role in defect recognition and segmentation in grading systems due to its powerful discriminating ability [8].Also, it is closely related to some internal quality of fruits and vegetables, such as maturity and sugar content [13].Therefore, it can be said that texture is one of the widely used indicators the consumer uses for quality assessment of fruits and vegetables.
Many researchers in their papers suggested different approaches and methods for digital image processing, including detection, segmentation and solved problems [14][15][16][17][18][19].Various statistical measures are used in a wide range of scientific and social research.The main purpose in image processing is to highlight the application of these measures in various fields such as image enhancement, image restoration, image denoising, edge detection, etc. Entropy is one of them.Entropy is best known from thermodynamics, but also is well known as information entropy theory.There is also state entropy model proposed by [20] for supplier evaluation and selection.The use of image entropy as a criterion for threshold selection has a long tradition and numeros methods have been proposed [21][22][23].A practical system used for inspecting potato chips applies entropy, contrast, energy and homogeneity features to various color channels in the CIELAB and HSV scale in order to determine the quality class of a chip [24].In [12] authors used mean and entropy for predicting color and moisture content during drying process of soybean.
The aim of this work is to find new criteria for segmentation based just on digital images from one camera, using classical statistical indicators and methods.The idea emerged when browsing literature on the application of fractal and multifractal analysis in digital imaging processing to recognize textures of different materials.Multifractal analysis is most commonly used in medicine for the discovery of changes in tissues [25].Two basic indicators are fractal dimension used in grayscale images and lacunarity used in binary images to determine gaps in the fractal image.Lacunarity is calculated as the squared coefficient of variation, Eq. (1).
Two calculation procedures are most frequently applied: box counting and gliding box.The application of multifractal analysis to color images is predominantly reduced to the conversion into binary images, but there are different approaches to calculating values in the RGB color space [26][27][28].The main idea was to check whether the values obtained by the lacunarity equation can be implemented as criteria for segmentation of agricultural products' images.Further, a gliding box algorithm was developed using the Matlab ® program.

MATERIALS AND METHODS
Initial research of potential criterion included the original images of raspberry (Rubus idaeus L.) obtained during the process of sorting fruits with a color sorter Optyx, manufacturer Key Technology (http://www.key.net/products/optyx).The system provides images of 1024×1024 pixels.The frame grabber digitised and decoded the composite video signal from the camera into three user-defined buffers in red, green and blue color coordinates (RGB).The vision system was part of the robotic system for automatic inspection, handling and packing.
A representative sample was taken from images of 84 fruits in total.Evaluation of sensory quality parameters was conducted using a point-type system of analytical descriptive tests [29].Three trained panelists performed evaluation by visual inspection.Evaluated samples were ranked into three categories: acceptable raspberries, unacceptable raspberries, and impurities.Fig. 1 shows part of analyzed images for each category.Schematic view of the proposed algorithm for analyzing digital images is presented in Fig. 2. For the first iteration of image analysis AdobePhotohop ® was used, where regions of interest were extracted, and black background was set up.Each raspberry as well as all impurities that can be found on inspection, were separately saved images as 70×70 px in size, 8-bit in their original .bmpformat.Program Matlab ® was used for further analysis, extracting individual values of colors r, g, b (red, green and blue).Since color can be extracted from each pixel of the region of interest, those variables emerge in order to express the degree of heterogeneity.Most of the studies show the result of measurements as the average color and its standard deviation from all the pixels selected in the region of interest [30][31][32][33], this standard deviation being mainly a consequence of the heterogeneity instead of the measuring error.
Besides the standard deviation, there is also entropy, which is dimensionless [34].It is a statistical measure of randomness that can be used to characterize the texture of the image, Eq. (2): where p i is the probability that the difference between 2 adjacent pixels is equal to i, and log 2 is the base 2 logarithm.Matlab ® has already built-in function.
Further, calculation of the following statistical indicators was performed for each individual color: average value (Avg), standard deviation (Stdv) and entropy (E).The main idea was to find correlation between standard deviation and average value, and the same formula for lacunarity on binary images, which is the squared coefficient of variation, was applied.In this case, that criterion was marked as L and calculated separately for each r, g, b channel.
Thereafter, calculation of additional potential criteria was done in the form of the color ratio for defining good and bad object/product, respectively: r rel , g rel , b rel and r/g, r/b, g/r, g/b, b/r and b/g.Also, statistical tests were performed for each criterion in order to confirm or deny normal distribution, based on which further use of statistical parametric or nonparametric tests is established.For that end, the Shapiro-Wilk test was employed, which is more convenient for smaller-size samples <50.Then, descriptive analysis of all mentioned potential criteria was carried out.The significance of difference of the examined criteria among samples was checked using one-way Anova method at a significance level of α = 0.05.Also, we wanted to know if any correlation between these criteria exists.For that purpose, we used bivariate correlation analysis and Pearson's coefficient.Statistical analysis of the results was performed in the computer program IBM SPSS ® 21.0 and MSExcel ® 2013.After obtained results were checked and analyzed, we made a hypothesis that appropriate image segmentation can be done according to the criteria L and E. Further, the range of values (min to max) was defined for the first category of acceptable products.Also, results were tested by combination of other proposed criteria.The next step was to develop the algorithm in Matlab®, the so-called gliding box algorithm, because the texture can be considered in a region and not in individual pixels.For the calculation of defined image color parameters I(x, y), a square was chosen 10×10 px in size.The initial square is located at point (0, 0), in the upper lefthand corner, Fig. 3.The algorithm records Avg, Stdv, E and L for every channel: r, g, b that are associated with the image underneath the moving window.If the parameters coincide with a specified range of values, the square is colored with a chosen color, in our case it was yellow to make the image better.The window is then translated by one pixel to the right and the underlying mentioned statistical measure is again recorded.When the moving window reaches the right-hand side of the image, it is moved back to its starting point at the left-hand side of the image and is translated by one pixel downward.The computation proceeds until the moving window reaches the lower right-hand edge of the image, at which point it has explored every one of its possible positions, i.e., to the endpoint (xm−9, y n−9 ).Schematic representation is given in Fig. 3 and part of the code in Matlab ® by which the trajectory of the square is made across the image.

RESULTS AND DISCUSSION
The research included determination of raspberry fruits sensory characteristics, as well as their categorization according to obtained scores.Thirty raspberry fruits or approx.35.71 % were rated acceptable and classified into category I, 28 raspberry fruits or approx.33.3 % were rated unacceptable and placed into category II (dark, damaged, moldy, ...), and lastly for the feature impurities (leaves, stalks, ...) 26 raspberries or approx.31 % were rated and classified into category III.After the results for parameters are summerized: the color ratio does not differ significantly but there was a statistically significant difference of parameter L between categories as determined by one-way ANOVA for R channel (F( 2   Pearson correlation coefficient was computed to assess the relationship between the average color, L and entropy.There was a positive moderate correlation between the two variables L and E for all three channels, r = 0,323.A scatter plot matrix summarizes the results (Fig. 10).For the first check, criterion L was applied alone.For the second check, combination of L and E was chosen, whereas for the third check, combination of L and Avg was selected, the min -max ranges of values for the proposed criteria being taken from Tab. 1.The algorithm was not checked for criteria Avg and Stdv each, because the previous work confirmed that they do not yield the appropriate result [35].
The application of criterion L alone to the images resulted in recognition of 72-91% of good quality product, whereas the application of combination L and Avg varied a lot, from 30 to 87%.In the application to the second group of images, with damaged fruits, better results were produced by combination L+avg , as well as in the third category of images with impurities presence.
Good result means appropriate formation of the product shape when marking with yellow color.According to the images displayed, the problem of non-recognition is commonly located in the middle of the raspberry fruit, being magnified these parts in the image are visually noticed to be mildly blurred.This is a consequence of insufficient lighting or insufficiently clean glass between the bulbs and the conveyor belt with fruits.The question for potential research reads: why didn't the first proposed criterion L play alone a good role in image segmentation based on obtained results?Values obtained from category I differ statististically significantly from values of inappropriate products and impurities.
In [36], two prediction models PLSR and NN were developed to compare color features with moisture content of cooked beef joints.He used mean and standard deviation values from RGB and HSI color space components.Saturation was the one that had the largest contribution to the results of the prediction model and it was not sufficient for establishing the correlation between meat colour and its moisture content.In [37] a robust algorithm was proposed to estimate a global threshold for segmenting food image from a background, using a statistical approach from RGB componets.They used a combination of integrated Matlab ® statistical functions which gave good results of transfer original to binary images.
In [14] authors wanted to determine the efficiency of a RGB color imaging technique to classify dates into three classes based on hardness.The RGB image of individual date sample was analyzed using Matlab software and classification models were developed using linear discriminant analysis (LDA) with all features and stepwise discriminant analysis (SDA).The overall classification accuracy was from 69% to 91%.Their conclusion was that imaging techniques have great potential to develop on-line quality monitoring systems for dates based on hardness, but further studies are required using other image acquisition systems to improve the classification.
The color of the fruits, even in fruits of the same species, can slightly vary depending on many factors such as the maturity state.Using only morphological features for identification, these varieties could not provide good enough results.Since this segmentation method strongly depends on the color of each individual pixel, it is very sensitive to these changes.For this reason, the algorithm needed to be trained and a new table created for every test session.

CONCLUSION
This paper presents the first results of the initial hypothesis that only based on color camera images it is possible to perform appropriate segmentation and recognition of the object in the image.The proposed idea and algorithm have a range of advantages and disadvantages, which gives room for further research and improvement.The results showed that the color and morphological feature, as entropy, alone were not able to recognize good from bad product with high accuracy.However, the combination of these two features certainly can show acceptable results.For future research, it is necessary to carry out more experiments using a larger number of samples and different storage conditions to achieve a more complete study.Plans for future research include peas, broad beans, sweet corn, with additional studies on how to accelerate the recognition process itself as well as its implementation in ready application.One of the questions that could be of interest for future research is whether based on L criterion combined with one morphological trait more, it is possible to make a range of values for various agricultural products, and based on them, the machine can probably automatically recognize the product underneath the camera.

Figure 1 Figure 2
Figure 1 Part of analysed images: a) acceptable raspberry, b) unacceptable raspberry, c) impurities age setup

Figure 3
Figure 3 Schematic representation of the square moving with specific dynamics along the axes shown

Figure 8 L
Figure8L values for all three categories (I-acceptable, II-unacceptable, IIIimpurities) for green channel (g).

Figure 9 L
Figure 9 L values for all three categories (I-acceptable, II-unacceptable, IIIimpurities) for blue channel (b);

Figure 10
Figure 10 Scatter plot matrix of parameters Avg, L and E for red, green and blue channel Based on these conclusions, the following criteria are proposed: min and max values of L, Avg and E from r, g, b parameters obtained from images of raspberries that belong to category I of acceptable products, Tab.1.For the first check, criterion L was applied alone.For the second check, combination of L and E was chosen, whereas for the third check, combination of L and Avg was selected, the min -max ranges of values for the proposed criteria being taken from Tab. 1.The algorithm was not

Table 1
Proposed criteria and values

Table 2
Results of different criteria combination