A Method for Evaluating Human Observer's Perception of Color Differences

: The purpose of this paper is to propose a new method for evaluating a human observer's perception of color differences in digital environment. The method is based on recording the observer's reaction when watching the fields of different colors. The fields are shown on high-end expanded gamut calibration monitor driven by originally developed software for performing the test. The method is convenient for quantification of an individual observer's tolerance to color difference for one or more specific colors. It could help in situations when color is a critical parameter for product quality, to quantitatively determine color tolerance that would be acceptable for the final user or client.


INTRODUCTION
The issue of accurate color reproduction is of great importance in many industries. Customers are increasingly demanding that the difference between reproduced and reference color, as well as the variations in the production, be as small as possible, whether it is in the printing, packaging, coating, automotive or paper industry. However, the interpretation of numerical data obtained as a result of measurement has proven to be somewhat problematic in practice.
None of the existing color models are uniform. Thus, one person can differently perceive color difference expressed with the same numeric value within the different areas of the same color space [1].
Szafir et al. who deal with the issue of color difference perception claim that conventional color difference metrics significantly underestimate the necessary differences between encoded values and that necessary differences between marks vary with the type of visualization being used [2,3]. Their experiments are based on an empirically validated method from color science for constructing probabilistic models of difference perceptions to generate data-driven metrics for designers to consider when creating, evaluating, and refining visualizations.
Liang et al. carried out two separate but similar experiments at Leeds University (UK) and Zhejiang University (China), respectively. Eleven of the MacAdam centres within the colour gamut of the display were studied. Both experiments were conducted to assess colour differences using Eizo displays using the ratio method [4]. García et al. carried out a measurement of the relationship between perceived and computed color differences. Ideally, computed color difference (obtained from instrumental measurements) should approach visual color difference (which is the reference value) as far as possible; nevertheless, this usually does not happen [5].
Based on previous research experience, the main goal of this work is to create an apparatus and method for testing an observer's sensitivity to a various color difference in different areas of color space. This device and method will be applied to human observer with normal color vision, in order to confirm its functionality and reliability. The results obtained from one responder will be analysed and thoroughly explained.

THEORY 2.1 Classical Psychophysical Threshold Test
Techniques for determining the sensitivity threshold are performed so that they can determine the observed limit of change in the stimulus or the just noticeable difference (JND) caused by the change in a stimulus. Two different types of thresholds can be calculated: the absolute threshold and threshold of difference. The absolute threshold determines the minimum amount of stimulus necessary to detect it. The difference threshold reveals the smallest visible change in a given stimulus. Techniques used in the experiment are [6]: -Method of adjustment; -Method of limits.

Method of Adjustment
The adjustment method is one of the most acceptable methods for determining the threshold of sensitivity of the subjects to a given stimulus. When testing with this technique, the subject has control over the magnitudes of the stimuli, which he adjusts to reach a certain level of matching. The tuning method may involve tuning to determine a barely noticeable difference (for absolute JND) or tuning a stimulus until it is different from another (for difference JND).
The threshold is then determined by taking the average adjustment across several trials.

Method of Limits
The constraint method is more complex and provides more accurate threshold data than the adjustments method. In this technique the respondent has no control over the presentation of the stimulus. Stimulus magnitudes are predefined in descending or ascending order.
In the case of a descending sequence, the stimulus starts from a value above the sensitivity threshold. The respondent reports whether he has noticed a change in stimulus size. If an observer feels the stimulus (answers "yes"), then a new stimulus with a lower value is presented to him. Everything is repeated until the observer is no longer able to feel the stimulus.
For an ascending series, the first stimulus is presented in such a way that it is definitely not detectable. The observer is asked to respond "yes" if the stimulus is seen, or "no" if not. If the observer responds "no," the stimulus intensity is increased. This is repeated until the observer can see the stimulus.
The threshold is determined to be the average of when the observer first detects the stimulus in the ascending series or does not detect the stimulus in a descending series.

Color Difference
There are several equations for color difference calculation, each one developed to overcome the problem of non-uniformity of a color model. Thus, it must be stated which equation is used for calculating quantitative color difference indicator.
The human perception of a color, as well as perception of color differences, is a very complex and not jet completely understood phenomenon that takes place in the human brain, based on the interpretation of neural signals generated by light that enters the eye of a human observer. There are three interacting elements that are necessary for human vision and color perception: a light source, an observed object, and a human observer.
The International Committee on Illumination (CIE) has defined the standard observer who observes colors as a person with standard vision. The standard CIE observer is defined by spectral tristimulus values for wavelengths in the visible spectrum, that have been determined by experiment taken at the vision angle of 2° (CIE 1931 a standard colorimetric observer). Sometimes later, it was discovered that some of the color-sensitive cones in the eye are distributed outside of the fovea, so new measurements were taken, at the vision angle of 10° [7].
Tristimulus spectral values represent a quantitative measure of various types of a human observer's receptors sensitivity to visible light of three different wavelengths, roughly divided into red, green and blue light areas.
The color difference caused by the change in the visual effects of the compared samples can be expressed by the calculation of colorimetric difference ∆E*.
All previous research has shown that visual methods of color comparison and color determination are subjective, while measurements using instruments are objective.
However, visual color description systems like Munsel's with their principles have served as the basis for all contemporary color description systems. According to these principles, each color can be described by three attributes: hue, saturation and lightness.
By mathematical transformations of tristimulus values, it is possible to get quantitative indicators of color, in different color models. One of the models is represented by a chromaticity diagram, that is not uniform. This nonuniformity is confirmed by Mac Adams's ellipses, which show parts of the diagram where a standard observer cannot see the color difference represented with the same distance as on the other parts of the diagram, where color difference is visible [8]. A typical example of this is the fact that the same color differences are presented with bigger distances in the green area than in the blue area.
The trichromatic theory offers an explanation for color perception at the level of photoreceptors. This theory was established by Young and Helmholtz in 1802 and further developed by Helmholtz in 1850 when he introduced three types of receptors those sensitive to blue, red and green color (RGB) [9]. The way how these receptors work was explained by Ewald Hering in 1892. In his opponent color theory, he stated that photoreceptors are neurologically linked and explained how they work together to enable the phenomena of color perception. In 1957, this theory was quantified by Leo Hurvich and Dorothea Jameson [10]. Also, this theory provides the basis for the structure of a three-dimensional CIE L*a*b color model (Fig. 1).
CIE L*a*b* system uses three coordinates to describe a color: coordinate L* refers to lightness, coordinate a* corresponds to the amount of red (+a) or green color (−a), while coordinate b* shows the amount of yellow (+b) or blue (−b). The numerical values of these coordinates can be calculated by using CIE tristimulus values [11].

Equations for Calculating Color Difference
In a color specification system, color difference is equivalent to the distance between the positions of the two colors, for instance, sample and reference colors, and can be expressed with numeric values that are calculated using different mathematical equations.
One of the frequently used formulas to define color difference is the formula developed in 1976, marked ∆E* ab [13]. It is based on the CIE L*a*b* system. The value of ∆E* ab obtained from the values of L*, a* and b* coordinates defines a sphere around a point where the color of the measured original is placed. The diameter of the sphere does not depend on the position of the reference color in the CIE L*a*b* system. So each color placed within such sphere with the small enough diameter ∆E*ab has such a small difference from the original color that it is not perceivable visually, while all colors placed outside of this sphere can be perceived as different.
∆E* CMC , ∆E* 94 and ∆E* 00 are defined using CIE L*c*h* system and are calculated from the variables L*, a* and b* [14]. Unlike the equation for calculating ∆E* ab , this equation defines ellipsoidal distances instead of spherical. Also, both the size and shape of the ellipsoid change as the measured color dot moves within the color specification system. So ellipsoids are narrower in the orange area and wider in the green area of the system. Also, they are bigger in more areas where colors are more saturated. What these new models enable is a higher degree of agreement between the visual sensation and the measured value of the color difference [15].

∆E* CMC equation was developed in 1988 by the Color Measurement Committee of the Society of Dyers and
Colorists. It enables better agreement between visual assessment of the color difference and the instrumental measurement results than when the equation ∆E* ab is used. Mathematical results obtained by the use of ∆E* CMC equation present color difference by an ellipsoid around the original color dot such that its half-axes correspond with hue, saturation, and lightness. The ellipsoid shows the range of acceptability and its size depends on the position of the color dot within the system.
Color difference is in this case described by lightness difference (∆ L *), saturation difference (∆ c *) and hue difference (∆ h *). Since human eye better accepts lightness difference (L) than the difference in saturation (c), proportion L : c = 2 : 1 was originally adopted, which allows twice a bigger difference in lightness comparing to saturation difference. But the ∆E* CMC equation enables changing of this proportion to fit various specific circumstances, for measurement can be carried out in different industrial branches.
Equation ∆E* 94 was developed in 1994 by the International Committee of Illumination (CIE). Like in the case of ∆E* CMC equation, color difference in this formula is mathematically presented as an ellipsoid around the original color dot whose semi-axis represents hue, saturation, and lightness. This equation also allows changing of the proportion between lightness difference (K L ) and difference in saturation (K 1 , K 2 ). This proportion also affects the size and the shape of the ellipsoid, in a similar way as the L : c proportion in ∆E* CMC . K L , K 1 , and K 2 are constants, and in the graphic industry their values are K L = 1, K 1 = 0.045 and K 2 = 0.015.
Equation ∆E* 00 was developed in the year 2000 as a refinement of ∆E* 94 , to achieve better agreement between visual assessment and measurement using instruments [16]. Color difference is also expressed through the difference of three elements: lightness difference (∆L'), difference in saturation (∆C') and hue difference (∆H'). The constants used are constants S L , S C and S h , which include the influence of lightness, saturation, and the hue angle. In the part of the system where saturation is minimal, all three coefficients approximately equal 1, making the color difference ellipsoid turn into a sphere. But in the part with the biggest saturation, the saturation coefficient Sc is much higher than coefficients S L and S h , which makes the ellipsoid visibly elongated in the direction of the saturation axis. Coefficients K L , K C and K h depend on measurement circumstances, but their value is usually considered as equalling 1.
In contemporary practice, calculation results for small color difference values are usually expressed as a ratio between totally perceived difference and individual difference values in hue, chroma, and lightness. It applies for all cases where the visually perceived difference is small. Differentiation of this total difference into its components can only be expressed mathematically, based on a premise that the space around the position of the color is Euclidian. There are empirical functions that enable improvement of correlation between visually perceived and calculated color difference, but for very small differences they cannot be used, because of lack of visual data. So this area should be a subject of future research [17].
∆E in this work is calculated using equation ΔE* 00 , for it most accurately depicts perceptual color difference for industrial applications [18].

EXPERIMENTAL
The system consists of a computer, an expanded gamut calibration monitor, a spectrophotometer with calibration software and originally developed software to carry out the test method.
Before performing this experiment, a potential respondent is checked for color blindness, using the Ishihara color test [19]. Only the respondent with normal color vision could be included into this investigation.
The procedure consists of two series of tests. In each series the two fields of known colorimetric values the reference field and adjustable or changeable field are shown to respondents who have to react to color changes.
The software collects the data from the test and generates a table of an individual respondent's sensitivity to color difference.
The device and method are convenient for quantification of an individual observer's tolerance to color difference for one or more specific colors.

Hardware
The hardware components of the device used for the experiment are: -Computer with UHD hardware calibration monitor

Software
Calibration of the monitor is performed by a built-in spectrophotometer operating with ColorNavigator 6 color management software. The monitor is calibrated at D50 standard and hooded against ambient light.
The "Color Changer" an originally created software for the purpose of this work, to help in evaluating an observer's perception for color differences of a specific color.
It has the following functions: -It allows the researcher to generate an initial set of reference colored fields, and will be shown to the subject during the experiment.
-It allows the researcher to adjust the conditions of the experiment (the size, spacing and background color of the reference and variable fields, the step of changing colors, the extent to which colors change, the speed at which colors change, the number of changes per second, to show fields randomly ...).
-It allows the researcher to define a data set for the respondent, which can include personal data of respondent, which can be important for future research. The first three functions are accomplished by creating a configuration file.
-It automatically registers the L*a*b* values when the respondent reacts to the variable color field. -It automatically calculates color difference ΔE according to the selected equation (in this work the ΔE* 00 was selected) [20].
-At the end of the test the software automatically generates a table with L*a*b* and ΔE* 00 values of all fields, recorded when the respondent adjusts color of changeable field to be equal as reference field or confirmed the slightest noticeable or unacceptable color difference between reference and changeable fields.

The Process of Converting from One Color System to Another
The problem of displaying certain colors with given L*a*b* values on the monitor, which is initially an RGB device, is solved with the following approach: -A high-end monitor with extended gamut and hardware calibration capability was used.
-Tests were performed under controlled ambient lighting conditions, taking into account that subjects were wearing neutral colors. -The monitor is hardware calibrated.
-The color conversion algorithm, which is an integral part of the software, has been tested, which is explained below.
The process of converting from input color system (L*a*b*) to output color system (Hexadecimal color system -HEX) within Color Changer software is shown in the flowchart (Fig. 2). The flowchart depicted in Fig. 2a shows the three-step process of converting L*a*b* to HEX color data: 1. The device-independent L*a*b* data are converted in the XYZ data (Tristimulus values); 2. XYZ data are converted in the device-dependent RGB data; 3. RGB data are converted in the HEX color data which are the input values for the display.
In order to check the precision and accuracy of the color conversion process in software "Color changer", the conversion is performed back, in the opposite direction, as it is depicted in Fig. 2b. This backward conversion is not performed during performing the tests and the table with test results is generated from input L*a*b* values.
For each conversion channel, five deviation checks are made. The largest conversion discrepancy value has been minimized to less than 1%. This discrepancy is negligible comparing to a large number of colors that there exist 256 million in the HEX system, and even a larger number in the L*a*b* system.
It should also be mentioned that, unlike any other known system, this software can receive an input value at four decimals.

Preparation for the Test
Before conducting the examination of a particular respondent, the following preparations must be made: -Adjustment of test conditions in the software (number and color of reference fields, step and speed of change, range of change, size, spacing and background color of reference and changeable field, respondent data set, equation ΔE*); this is done only once at the beginning of the test series, by creating a configuration file. -Checking the monitor calibration, ambient lighting conditions, and color neutrality of the respondent's clothing (this and the following steps are repeated for each new respondent).
-Checking the respondents for Daltonism.
-Informing the respondents about the purpose and manner of the examination. -Completing personal information questionnaires (for example: email, gender, age, profession) -Trial (shortened) testing to get the respondent familiar with the commands and how the software works.
The testing of the individual respondent could start only when all abovementioned preparations are done.

Method of Determining an Observer's Sensitivity Threshold and Tolerance Threshold to Color Difference
The method consists of two interactive tests in which the respondent either adjusts the coloring himself or decides whether color difference is acceptable or not. For the purposes of this work, to check the functionality and usability of this method, one reference color of the test field is defined and shown on the left side of the monitor. The number of reference colors will be increased for further investigation in order to cover different parts of color space.
The two colored fields are presented to the respondent on the screen (Fig. 3). There is a reference color field on the left side and a changeable or adjustable field on the right side. The color of the left field does not change, while the color of the right field does. The color of the right field changes in six directions, according to the basic coordinates of the color space, as follows (Tab. 1): Therefore, the twelve tests (in random order) will be performed for each reference color (six directions in two series of tests). In this experiment, which is performed to test the functionality of the software and validate the method, one particular green reference color is defined.
The first test determines the threshold of the respondent's sensitivity to particular reference color. The respondent presses + or -keys, adjusting the coloring of the adjustable field (right hexagonal field in Fig. 3) until he or she sees no difference from the reference field (left hexagonal field in Fig. 3). The respondent then confirms his choice by pressing the Enter key, the software saves this data and offers the next reference color (if it is predefined).

Figure 3
Colored fields on a monitor shown to respondent (translation of original signs: "Upute za testiranje" -Testing instructions; "Referentno polje" -Reference field; "Pritisnuti taster "ENTER" za spremanje podataka i prelazak na sljedeće polje za testiranje" -Press the ENTER key to save the data and move to the next test field; "Povratak"-Return) In the second test, the color of the changeable field changes automatically. The colors of the fields are the same at the beginning. After the command is given by the respondent, the software automatically starts to change the color of the changeable (right) field. The respondent decides upon the slightest (barely visible) and the unacceptable difference in the coloring of the two fields. At the moment when the respondent notices the first difference in colors he should press the "space" key. Then the coloring of the right field keeps changing and the respondent again presses the "space" key when the difference in colors becomes unacceptable. The software records color values and offers the next colored field if it is predefined.
Both tests can be repeated until all predefined specific colors are checked. These tests would give the following information about the respondent: -the respondent's threshold sensitivity for color difference; -the slightest noticeable difference; -the unacceptable color difference.

RESULTS AND DISCUSSION
When all settings are done, and the test is completed, the software generates an excel spreadsheet that contains: -Condition of experiment (as specified in configuration file).
-Personal data of the respondent.
-Time and date when the test starts and the duration of the test.
-L*a*b* coordinates for reference fields; -L*a*b* coordinates of adjustable or changeable field, recorded upon action of responder, in each test phase.
-Automatically calculated ΔE* 00 between the reference color and the color of the changeable field that the respondent set in the first test (sensitivity threshold, one field × 6 directions = 6 values).
-Automatically calculated ΔE* 00 between the reference color and the color of the changeable field in the moment when the respondent noticed the slightest noticeable color difference (6 values).
-Automatically calculated ΔE* 00 between the reference color and the color of the changeable field when the respondent notices a first unacceptable color difference (6 values).

Respondent's Sensitivity Threshold for Color Difference of Particular Green Reference Color
The first test gives information on the respondent's sensitivity threshold, i.e. how well he or she perceives the color differences for a particular green color. When the respondent completes the color matching between the reference and the changeable field, the program calculates ∆E* 00 . In Tab. 2 (row 1), the results of this test, which was performed on one respondent, are shown for a particular green color when the L-value is increased, while the other values (a and b) remain constant.
L*a*b* coordinates for the reference field are marked a(ref) and b(ref). L*a*b* coordinates for the adjustable field are marked L(man), a(man) and b(man).
In the example in Tab. 2 (row 1) the respondent, adjusting the color of adjustable field to reference field of green color, achieved the ΔE* 00 = 1,45. Coordinates for adjustable field green achieved the next values: L = 56,70 a = −37,80; b = 31,64.
It could be concluded that this respondent has a tiny sensitivity threshold when the value on L coordinate (lightness) changes in a positive direction for reference green L = 55,15; a = −37,80; b = 31,64; and he or she will very soon notice the difference when the lightness of this color increases.
Tab. 2 (row 2) shows the results for reference green when the value at the L coordinate is decreased, while the other values (a and b) remain constant.
It could be concluded that this respondent has a much lower sensitivity threshold when the value on L coordinate (lightness) changes in a negative direction for reference green L = 55,15; a = −37,80; b = 31,64; and he or she will notice the difference sooner than in the case when L value increases. The total color difference, achieved in this case is ΔE* 00 = 0,19.
Tab. 2 (rows 3, 4, 5 and 6) show the results from the next four tests in accordance with testing procedures explained in chapter 3.5.
The average color difference results for this respondent is ΔE* 00 = 0,60 which represents the sensitivity threshold that this respondent has for specific green color shown in this examination. Corresponding to this test this respondent cannot see the color difference between tested and reference green color under the ΔE* 00 = 0,60, they look the same in front of his or her perceptual ability.

The Slightest Noticeable Difference and the Unacceptable Difference
The other test gives two groups of results: the slightest noticeable color difference and the unacceptable color difference.
The slightest noticeable color difference for reference green field is calculated after the second test is done and the results for ΔE* 00 can be seen in Tab. 3.
L*a*b* coordinates of the slightest noticeable difference for reference green tested color are marked with L(SND), a(SND) and b(SND).
The average color difference for this respondent is ΔE* 00 = 2,48 which represents the slightest noticeable color difference that this respondent can notice for specific green color shown in this examination. Corresponding to this test, this respondent sees the color difference when the difference between tested and reference green color reaches ΔE* 00 = 2,48; according to his perceptual ability.
Also, the unacceptable color difference for reference green field is calculated after the second test is done and the results for ΔE* 00 can be seen in Tab. 4.
L*a*b* coordinates of the unacceptable color difference for reference green tested color are marked as L (UAD), a(UAD) and b(UAD).
The average color difference for this respondent is ΔE* 00 = 3,94 which represents unacceptable color difference, the color difference that this respondent cannot accept any more for specific green color shown in this research.
This respondent corresponding to this test sees the color difference that is not acceptable for him or her when the difference between tested and reference green color reaches ΔE* 00 = 3,94 according to his or her perceptual abilities.

Visual Representation of Correlation Between the Threshold, Slightest Noticeable Difference and the Unacceptable Difference
The position of reference green color L = 55,15; a = −37,8; b = 31,64 in the CIE L*a*b* color space is represented by the red dot in Fig. 4.
The smallest green dots represent the sensitivity threshold (ST) determined during testing. In a similar way, the middle-sized dots represent the slightest noticeable color difference (SND) and the biggest green dots represent the unacceptable color difference (UND).  It can be seen that, in each direction, the smallest green dot (ST) is closest to the red dot, and the largest green dot (UAD) is furthest from the red dot.
This logically correct arrangement of dots confirms that the respondent worked correctly and committedly during the test. If no logically correct arrangement of the points is achieved, the test result must be rejected and the test repeated.

CONCLUSIONS
This work presents a new device and method for evaluating perception of color differences tolerance of human observers.
This method is original and can be used to quantitatively determine: -Sensitivity threshold of tested human observer in various parts of color space. -The slightest noticeable color difference that a tested human observer can recognize in various parts of color space.
-The smallest color difference that a tested human observer considers unacceptable.
Since it collects and records personal data, it can be used to determine perception of color difference of certain population.
Since initial setting could be adjusted, this method could be used to test the color perception of a certain human observer to a particular color or a set of colors. A typical example for this is when manufacturer of a colored product and his client are trying to objectively determine color tolerance acceptable for both parties.
This method enables a constant check on the person responsible for choosing and evaluating the color, the person who makes final decision of acceptance or rejection of the product, for instance in printing, packaging, coating, automotive, chemical and many other industries.
In many aforementioned studies, it is assumed that a human observer's perception of color difference changes over time and is subject to many influences [21]. Those changes for one particular respondent can be easily tracked by his re-testing under the same circumstances and using the same ∆E* formula for data processing.
This method offers tools to challenge the personalization of tolerance values to color difference sensitivity for each respondent. Of course, personalization will not directly influence color reproduction technologies, but it can cut some steps in it due to more strict tolerance values.
In further research, this method will be used to characterize respondents and their ∆E* tolerance not only to one or several colors but to the whole color space and ∆E* tolerance taking into account simultaneous color contrast.
In further research, based on the responses of a number of respondents a map can be created that shows the perception of an average observer of a particular group.