EYE MOVEMENT ANALYSIS OF IMAGE QUALITY PARAMETERS COMPARED TO SUBJECTIVE IMAGE QUALITY ASSESSMENT

Original scientific paper Image quality can be determined by using objective or subjective quality assessment methods. Objective methods are based on mathematical measures, such as PSNR, SSIM or RMSE and subjective testing is generally performed by asking the participants which of the given options they prefer or to give a quality score for the presented options. For each image quality evaluation, an image database is required. We developed a novel image database that consists of 30 images on which we applied some manipulations based on different quality parameters. First we conducted testing with the eye-tracking method: by showing images to test participants and measuring their eye movement, we received accurate information about how each of the image quality parameters affected the communication value of each image. The subjective quality assessment method we then employed involved the development of an application for crowdsourcing-based testing. Participants had to determine which of the images the best were. Finally, a correlation between both methods was determined.


Introduction
In today's publications, images are linked with headlines to present easily observed visual elements.Important factor is so-called image information overload, as we are subjected to images all the time [1,2].Based on image perception, a person will decide whether they want to read the article, news, etc.Having images with a high communication value is therefore extremely important, and determining which image has a higher potential is in this context crucial.One of the main parameters of the communication value of images is the visual quality.
The measurement of visual quality is of fundamental importance for numerous image and video processing applications, where the goal of quality assessment algorithms is to automatically assess the quality of images or videos in agreement with human quality judgments [3].
Image quality assessment aims to use computational models to measure the image visual quality consistently with subjective evaluations, based on the fact that the human visual system understands an image mainly according to its low-level features [4÷6].
Image quality can be determined in different ways.We can use objective quality assessment methods, e.g.calculating RMSE (root mean square error), PSNR (peak signal to noise ratio) and SSIM index (structural similarity index) [7÷10], we can measure the human response, e.g. with the help of eye movement measurements (novel approach), or we can employ subjective methods, where we evaluate the image quality with the help of surveys, tests, questionnaires etc.In both, eye movement measuring and subjective methods, human participants are included.
Eye movement tracking has been used many times in imaging research [11÷13]; however, the approach where we measure the influence of image complexity on image acceptance and on the way we look at the image, comparing it to subjective image quality assessment, is a novelty.

Image database
A good testing image database is very important in this kind of research.More than a dozen databases are available for doing this, two most common being TID2008 and TID2013 [14÷17].The positive aspect of using a widely used database is the potential for comparing the collected data to other research.However, TID2008 does not meet our demands to a sufficient extent: the resolution of images in TID2008 is not high enough to conduct a subjective testing or eye-tracking measurements and the images have a too small detail coverage spread and their colour gamut is also not wide enough [18÷23].In consequence, a new image database was introduced [24].In our research, we refer to this database as a novel image database (Fig. 1).
The novel image database was developed using a number of manipulations that were based on image quality parameters calculations.The coverage of details was measured with the help of ImageJ 1.50g software, where we employed edge detection and threshold functions to determine the details of each image.The images chosen for our database have 22÷99 % detail coverage, whereas the images in TID2008 have 46÷89 % detail coverage, making our database by 57 % more complex (Fig. 2).Detail diversity is one of the most important factors when it comes to the communication value evaluation.Different approaches of image evaluation have been carried out [25,26]; however, for the purpose of this research, a detail diversity evaluation was the most suitable.Images were also selected based on their average colour for the colour gamut we established for our database to be wider than in TID2008 (Fig. 3).
The next step was to select the image quality parameters that are most common in everyday use.We included sharpness, contrast, noise, saturation, size manipulation and compression.Each of the parameters was applied in MATLAB R2014a, using different approaches and steps, in order to make 38 manipulations for each image.Our novel image database has overall 1140 images in total 1920 × 1440 pixel resolution (for big screen testing).In previous research, the image quality assessment was conducted using objective methods [27], while we wanted in the present study to test acceptance of manipulated images with eye movement measurement and to use a method for a subjective quality assessment [9], which would include observers into the testing.The real research problem that arose was not having access to a significant number of previous research studies to base our work on.That is why a new method was developed of how to compare subjective quality assessment results to the eye movement measurements.A hypothesis has been made, that eye movement measurements can be used to determine image visual quality.

Methodology
Both of the methods that were employed are based on the same images from our novel image database.

Eye movement measurement
It is nowadays very common to use the eye-tracking method in a variety of visual research studies.The goal was to use this method to determine the influence of image manipulation in the way the participants observed the image.Therefore, TOBII X120, HP ZR24W LCD screen, a PC, controlled dark room environment [27] and TOBII Studio 3.4.4software, were used (Fig. 4).
The images chosen from our database for later use in the eye-tracking testing were those that indicated the most visible manipulation to the human eye.For each of the 30 images, 10 manipulations were chosen, as well as the unmanipulated image, 330 images altogether hence being included in the testing.
We tried to prevent one person seeing one image more than once [28÷30]; however, at the same time, we wanted each person to see as many different manipulations as possible.The images were carefully separated into groups (Tab.1).One group of participants looked at unmanipulated images -we labelled that group reference group A.

JPEG compression noise resize
Image number Each test involved 10 participants, i.e. altogether 110 participants, 50 % female and 50 % male, and 50 % below the age of 30 and 50% aged 30 or more.The average age of all participants was 33,39 years and the distribution between the genders was the same in all 11 tests.All participants come from Slovenia and have normal or corrected-to-normal vision.

Crowdsourcing
For a crowdsourced testing, a web application was developed in which participants had to decide which of the two images appeared better to them.A multilanguage application was developed using PHP, HTML5 and CSS3, and consisted of an introduction screen, a data gathering screen (age, gender, location), test instructions, the test itself and a final page (Fig. 5, Fig. 6).For the testing, the same manipulated images as in eye-tracking were used.The images were automatically placed into pairs for each observer separately, whereas the pairs were only built from the manipulations of the same image.Each observer had to decide between 150 pairs of images.For this analysis, all the data were automatically gathered in a CSV file.The crowdsourcing-based subjective testing included 355 participants, 58% female and 42% male, and 56% below the age of 30 and 34% aged 30 or more.The average age of all participants was 32,39 years.94% of participants come from Slovenia and the remaining 6% from 10 other countries.All participants had normal or corrected-to-normal vision.Altogether 53250 decisions were made between image pairs.The test took place in an uncontrolled environment [31].

Eye movement measurement
The main goal of the gathered data analysis was to compare how the way participants were looking at the images changed according to the parameters that were used for manipulation.To accomplish this, a new way of measuring the viewed area was developed.Measured gaze plots with an enabled duration setting were exported, so that we received only smaller and bigger black dots, representing fixation points (export was done to transparent PNG image files; Fig. 7).Counting the black pixels on all exported gaze plots provided an objective measurement for how the way participants looked at an image changed compared to unmanipulated reference images.This comparison was done with substracting observed areas on manimulated image with observed areas on reference image.Observing the analysed data (Fig. 8), it can be seen that the smallest deviations in observing a reference image and the manipulated image appeared at higher sharpness (0,43 %), followed by lower contrast (1,20 %), resize (1,47 %), noise (1,68 %), saturation (1,80 %), lower lightness (2,52 %), lower sharpness (2,69 %), higher contrast (3,04 %), compression (3,95 %), while the most significant change was observed at higher lightness (7,01 %).
Analysing the gathered data revealed that coefficient of variation that was images That shows a very high importance of image content (Tab.2)acceptance rate was very different when observing different images.

Crowdsourcing
The data that were gathered with crowdsourcing consist of 53250 decisions pertaining to which image in a pair is better.Each parameter had the same amount of appearance, thus counting only the chosen parameters as an objective comparison.The most preferred parameters or those with the highest acceptance rate were higher sharpness (16,15 %) and resize (16,10 %), followed by lower contrast (14,84 %), saturation (12,63 %), noise (9,88 %), compression (8,02 %), lower lightness (6,55 %), lower sharpness (5,87 %), higher lightness (5,21 %) and the least preferred parameter was higher contrast (4,76 %).We can also observe a small difference between the genders and age groups (Fig. 9, Fig. 10).A further analysis of gathered data was conducted by comparing acceptance rate to image complexity (amount of detail on an image).Fig. 11 shows that acceptance rate was concurrent with complexity when images were manipulated with compression.This was very similar when we consider images manipulated by noise (Fig. 12).
The result was different when analysing data from images with lower sharpness, where acceptance rate rose alongside the drop of image complexity (Fig. 13).Analysing the gathered data revealed that coefficient of variation that was calculated between all 30 included images is high for all quality parameters.That shows a very high importance of image content (Tab.3) -change in viewing was very different when observing different images.As shown in the results, two different methods of quality assessment data gathering were employed, namely eye-tracking and web-based crowdsourcing.
Both methods were extremely useful for gathering image visual quality data.They offer enough comfort, are not unpleasant, they do not take much time (eye-tracker testing takes about 3,5 minutes and web application about 6 minutes).The danger of errors is highly reduced when there are a large number of test participants (eye-tracker 110, web application 355); therefore, any possible anomaly is unlikely to have a significant influence on the final result.When comparing data from both tests, some resemblance can be observed (Tab.4).In both cases, higher sharpness had the lowest influence on observance which was expected since higher sharpness increases the visual quality of an image.This was similar for next four parameters, where we can see the same are present with small differences.We believe that the resize parameter was more accepted with the crowdsourcing approach due to smaller screens the participants took the test on -the screen size is of course directly connected to resize parameter.Similarly, in the case of noise and compression, a smaller screen size reduces the size of visible artefacts that appear after high compression or noise addition.The same phenomena can also be used to describe the difference for the lower sharpness parameter: not sharp images appear sharper on a smaller screen.
Altogether, we believe that both approaches supported one another and that differences appeared primarily due to the controlled environment used for the eye movement measurements.Usual way of comparing that kind of results is by ranking them (Tab.5).The Spearman correlation coefficient between ranked results of both methods was 0,88 (Fig. 14).The dependence of image complexity observed for the crowdsourcing test results confirmed our expectations as well.Noise and compression are the parameters that generally appear on their own in artefacts and can be easily observed with a human eye.The more empty spaces there are on an image, the easier it is to see them.In other words, the acceptance rate became higher alongside the image complexity: the more details there were, the more difficult it was to observe the artefacts.The opposite was true for lower sharpness.The more elements appear in an image, the easier it becomes to see the unsharp elements as they appear more often.Unsharp images are therefore better accepted when they have fewer elements or are less complex.In the cases of other parameters, no noticeable influence of complexity on the acceptance, was observed.Observing the image visual quality with the help of eye movement measurements and subjective testing led to the results that were unsurprising, and which mostly confirm our hypotheses and expectations.Our tests confirmed that both of the chosen methods were suitable for this type of research and very important for further work, where we would like to continue researching the influence of different image parameters on visual quality and most importantly, image communication value.The communication value, as described in the introduction, is the main reason for the importance of truly understanding and having the ability to predict the types of images that will attract more readers, as well as the images that should not be used.
As for the high correlation between both methods, the important discovery is that eye movement measurement has a big potential in image quality assessment and will be further researched.

Conclusion
In the presented research, we employed a novel image test database we developed to get image quality visual data by using objective or subjective quality assessment methods.The presented results, using eyetracking method and web-based crowdsourcing, confirm our hypotheses and expectations to define how each of the image quality parameters affects the communication value of each image.In this way, we were able to determine which parameters had a greater impact on the image perception.Both methods were extremely useful for gathering image visual quality data.
In future research, we are planning to continue using both of the discussed methods.Moreover, a further analysis of the results, and a comparison between them and the results from an objective quality assessment method is planned.With this additional work, we hope that we will enable predicting the communication value of an image.We are also confident that we will discover more about the dependence of the type of image manipulation and its complexity.
We are also planning to conduct some experiments that will include the parameter of colour into our observations.In this way, we aim to research specific colours that have the highest acceptance rate among different people, what the influence of gender, age and cultural environment on communication value is and how possible it is to confidently predict the success of an image.

Figure 1
Figure 1 Images in novel image database

Figure 3
Figure 3 CIELAB colour values of average colour for each image in novel image database () and in TID2008 ().

Figure 4
Figure 4 Eye-tracking measurement set up

Figure 5 Figure 6
Figure 5 Organigram of web application

Figure 7 Figure 8
Figure 8 Change in viewing for all images in dependence on image parameter (lower bar is better)

Figure 9
Figure 9 Participant image acceptance rate in dependence on image quality parameter -separated by gender

Figure 10 Figure 11 Figure 12 Figure 13
Figure 10 Participant image acceptance rate in dependence on image quality parameter -separated by age

Figure 14
Figure 14 Image parameters correlation between both used methods (ranked results)

Table 1
Test groups were divided into: reference group A and groups B1-B10 (each colour represents one group)

Table 2
Analysis of image acceptance rate data in dependence of quality parameters

Table 3
Analysis of change in viewing in dependence on quality parametersQuality parameters x min / % x max / %

Table 4
Quality parameter influence on observance (results)

Table 5
Quality parameter influence on observance (ranked results)