Easymatch- An Eye Localization Method for Frontal Face Images Using Facial Landmarks

Eye detection algorithms are being used in many fields such as camera applications for entertainment and commercial purposes, gaze detection applications, computer-human interaction applications, and eye recognition applications for security, etc. Successful and fast eye detection is an essential step for all these applications in order to achieve good results. There are many eye detection methods in the literature, and most of them rely on the Viola-Jones method to detect the face before localizing eyes. In this paper, a straightforward approach to detect eyes from images which contain a frontal face is proposed. The approach can be used for real-time eye detection using cheap web cameras or other cameras. First, face landmarks are detected from the image, and by utilizing these landmarks; the eye region is determined. The eye radius is estimated by utilizing eye corners. Then, reduced input images are tested with a tailored matching algorithm which does not need image reduction to determine where the eye is.


INTRODUCTION
Eye detection is a fundamental step for many applications including security, marketing, psychology, human-computer interaction and so on. After detecting a face in an image, the next logical step is to locate the elements of the face such as eyes, mouth, chin, etc. The necessity of locating eyes comes from the importance of eyes because as much as eyes are windows to the outside world, they are also windows to the inside. For example, the eyes have definitive signs to express emotions or mental state. The eye pupil may get bigger when in distress, or it may look upward while thinking and this kind of patterns can be used in creating a psychological profile which can then be used for marketing or treating psychological diseases, etc. One can deduce where one is looking from the positions of his/her eyes. After locating the eye positions, it is also possible to detect gaze. Gaze detection can be used in computer-human interaction or finding answers to questions like where people tend to look first when entering a market or mall or school building, where people tend to look most etc. Also, just like fingerprints, eyes also have a unique print, which is used in identification [1,2]. By counting the number of blinks in a minute, it is possible to determine if a person is tired or sleepy, which is used in detecting whether the driver is tired or not [3]. All of these applications and many more like these rely heavily on locating eyes, and success of these methods is only as good as the success of the eye detection methods.
Current technologies can be classified into three categories. The first is invasive methods such as electrooculography methods which need to place electrodes on the skin near the eyes [4] or using contact-lens based on eye coil systems [5,6], etc. The second are semi-invasive methods which use some type of special light to illuminate the eyes. The last category is non-invasive and uses only images to determine where the eye is [7][8][9][10][11][12][13][14][15][16][17]. The invasive methods generally have a good detection rate, however, in today's world it is not possible to use these methods in many applications because they tend to have a high cost and low mobility. Semi-invasive methods have general usage and good detection rate; however, they also have a high cost. Non-invasive methods generally have a low cost; however, they have a lower detection rate because they can be affected by many factors such as light condition, makeup, eyeglasses, etc.
Non-invasive methods can be divided into two main categories. The first category contains methods which use A.I. (Artificial Intelligence) methods to determine eye position. A.I. based methods are generally composed of three main steps: creating an eye pattern, using a classifier or statistical model and finally by utilizing these, precise eye positions can be found [8,16,17]. The second category uses the general characteristic of an eye by utilizing pixel distribution, pixel values, histograms, etc. [7,10,12]. A detailed survey on eye detection and tracking techniques is presented in [18].
Usually, the first step in locating eye location has been to detect the face within the image. In order to detect a face within images, many of the applications use the Viola-Jones method. Viola-Jones method is a well-accepted method which works fast and has a satisfactory success rate. However, to determine the eye region, it has to be used two times; one for detecting the face and a second time to detect the eye region. However, unlike images of the static objects, the image of the eye changes dramatically depending on where the gaze is, this causes the creation of larger eye regions which even contains eyebrows by the Viola-Jones method. After detecting the face, it is possible to use an educated guess or other methods to determine the eye region. However, using educated guess also causes large eye regions, other methods may work, but they are also extra workload for the algorithm.
This study proposes an easy way to localize the eye position in frontal face images by utilizing a template matching based approach and uses facial landmarks to determine a region of interest.

LOCATING EYE
In the proposed method the most simplistic approach to locate the eye is used. A few steps are usually the same in many of the eye locating methods. First, detecting a face with Viola-Jones method [19], determining general eye areas with an educated guess or using an eye cascade to detect eye region, finally using a novel approach to locate where an eye is as shown in Fig. 1.

Finding Face
The purpose of detecting the face is to ensure that the search area has limited characteristics and appearances. There should not be objects or images like the eye in the search area. Also by reducing the area, the workload of the algorithm can be reduced. Face detection is a decade-old phenomenon, and there are many methods to solve this [20]. In literature, many choose the Viola-Jones method to detect the faces by using Haar-Like cascades. According to the tests carried out on the BioID database, this method manages to successfully detect faces in 1365 images out of 1521 images which means over 89% success rate. There are only eleven false positives out of 1521 images. These false positive results are usually a smaller region within a face so it can be easily cast off by choosing the biggest face from detected faces. For TalkingFace database success rate is 99.1%.
OpenCV (Open Computer Vision) library contains necessary cascade files for this kind of detection purposes, and in this study, frontal face file (haarcascade_frontalface_default.xml) is used to detect faces in images. Necessary files and more information about OpenCV could be found on their official web page [21].
Facial landmarks are a relatively new method to determine interest points in a person's face, and this method manages to not only locate the face with precision but also necessary interest points for determining many aspects of a human face [22]. In this study, a detector with 68 markups had been used as shown in Fig. 2   According to the tests carried out on the BioID database, the facial landmarks successfully detect the faces in the 1509 images from 1521 images, which means more than 99% success rate. According to the test carried out on the TalkingFace database, the facial landmarks successfully detect all faces. There are five false positives in TalkingFace database. However, like the Viola-Jones method, these false positive results are usually a smaller region within a detected face, thus it can be easily discarded by choosing the biggest face from detected faces. The ColorFeret database success rate is also 100%. As seen in the results; using facial landmarks has a much better success rate than the Viola-Jones method.

Determining the Eye Region and Reducing the Image
In order to determine a general eye region, taking an educated guess like breaking image to four square parts and supposing up left half contains left eye and up right half contains right eye can be used [11]. However, this approach would cause a much bigger region of interest than what was needed but ensures that the eye is within that area. Another option is using a Haar-Like cascade or similar detection method to determine where the general eye region is. However, the usage of Haar-like cascades gives poor results. Because the eye itself is moving within the eye region, it makes it very hard for cascades to detect the eye region with high accuracy. Also, using a secondary detection method would increase the computation cost, but may give a smaller eye region. In this study, facial landmarks are used to determine the region of interest. At first, the eye region was tried to be determined using all points as seen in Fig. 3. However, these landmarks are not as accurate as desired, and usually they do not contain the entire eye region. For this reason, these landmarks cannot be used to detect an eye region successfully. However, there is a natural rate between two eye corners and the eye radius. Since facial landmarks include eye corners as landmarks, these corners can be used to determine a small eye region as seen in Fig. 4. In this study, it is seen that errors in eye corner landmarks are not causing many problems in detecting the eye region. First, the distance between eye corners is computed using the Eqs. (1, 2) where l right and l left is the length between eye corners for right and left eyes, p i is the i th landmark as shown in Fig. 3, p i x and p i y are x and y coordinates of the landmark point p i respectively. In order to determine the eye region of the right eye, top left of the eye region as (p 1 x, p 1 y -1.2X l rigth ), right bottom of the eye region chosen as (p 10 x, p 10 y + l rigth ).
As for left eye, (p 9 x, p 9 y − 1,2Xl left ) chosen as top left and (p 6 x , p 6 y + l left ) for bottom right.
In this study, a smaller and more efficient region of interest for the eyes is determined by considering the natural rate between eye corners and the eye itself as seen in Fig. 4. Such a small eye region helps to avoid dark appearances, such as eyebrows and eyeglasses. Thus, it should help to raise the success rate of almost all of the eye detection methods because one of the main problems for eye detection methods is dark appearances like eyebrows or eyeglasses.
The ratio between eye radius and the distance between eye corners is estimated at about 0.22. The radius of the left and right eye can be found using the Eqs.

Template Matching
The template matching is a process where each pixel in one input image is compared to another image to find if there is a matching part in it. This is done in a windowed manner, and for each window, a similarity value is calculated. Best similarity value determines which of the windows is matched to the input image. There are many methods to calculate similarity value like square difference (5), the normalized square difference (6), correlation (7), normalized correlation (8), etc. In Eqs. (5) to (9) R is the similarity value, I input image, S source image which a match tried to find in it. xs, y s are starting positions of the window and x ' , y ' pixel positions of the input image.
The circular characteristic of the eyes makes it easy to use in a matching algorithm. In this study, images are reduced to black and white, and then a circle is used as a template for matching. Even though eyes are not fully circle, but an ellipse with a ratio of 1.13 usually this difference is negligible because cameras do not have the necessary resolution to catch that difference. However, in a state where the camera has a good resolution to catch that difference, an ellipse must be used as a template instead of a circle.
A simple input image which will be used in template matching is created using an estimated radius value as seen in Fig. 5. However, this approach does not provide satisfactory results. In order to improve the success rate and reduce the workload a modified approach based on template matching is tried. There are two pre-knowledge about the eye; the color of the eye is darker than is its vicinity and the eye has a circular shape. Since the eye has a circular shape if the sum of the points in a circular shape is calculated in a windowed manner, then the position of the window with the lowest sum can be accepted as the position of the eye. For this purpose a list is created which contains the x and y coordinates of each white point of the image as seen in the right image in Fig. 5. Then, the list is used in sliding windows. In each window, the sum of the values of the list points is calculated using the Eq. (9). Then it is determined where the biggest sum is achieved.
Using all points within the circle has a good success rate. However, in order to further reduce the workload of the method instead of using all points within a circle reduced points are used for experiment as seen in Fig. 6. Figure 6 Reduced images Considering characteristics of the eye, detecting eye position in a windowed manner within an eye region may be unnecessarily time-consuming. Also, as the resolution of the eye region grows, the time needed to compute every window would also grow. In order to further reduce the time consumption, instead of checking all of the windows, the two-stage window movement is tried as seen in Fig. 7. For the starting position, the middle of the y-axis of the eye region is chosen as a center of the y-axis of the window. After finding the best x-axis position for given y-axis; using best x-axis position the window is moved in y-axis to determine best overall position.

Matter of Blink
Blink detection is also a necessity for many applications which use eye detection. If it is known when a person blinks, then there is no need to try to detect eye positions since it is known that there is no eye in the image. Also by calculating eye blink frequency it is possible to determine the tiredness of an individual.
Eyeblink could be easily detected by utilizing landmark points (p2, p3, p11, p12, p4, p5, p7, p8 in Fig. 3) and eye radius which is calculated by the Eq. (2). Eyeblink can be determined by calculating the distance between the upper and bottom eyelids. If this distance is close to one or two pixels, then it can be assumed that the eye is shut. However, there is also a matter of deciding in which condition it should be deemed there is a blink. Should a moment where the eye is completely shut be chosen or should moments where eyes are partially shut also be added? As seen in Fig. 8, the act of blinking is composed of a few stages. When looking from the perspective of getting information out of images, in a state where eyes are partially shut and in a state where eyes are completely shut, they are the same because both have the same meaning that the eyes are in a state where they do not concentrate on looking. So, if the distance between the eyelids is smaller than half of the eye radius, it can be assumed that the eye is closed. Even though this is a practical approach there are still some negligible issues. When a person laughs or looks at a bright sight, people tend to partially close their eyes which may be interpreted as a blink. It is not easy to detect the difference between a blink and a laugh or a bright light reflex. It is possible to use other landmark points which correspond to mouth to detect the laugh or it may be possible to check the brightness by using a histogram. However, it would be too much effort to detect the difference, and it is an acceptable error. Also, it will not make much difference in practice.

EXPERIMENTAL RESULTS
In the experiments, a computer with Intel Core i7 2.8 GHz CPU and 32 GB ram has been used. The method is coded using python version 2.7. As for the computer vision library, OpenCV version 2.4 has been chosen.
The code takes 4.5 ms (millisecond) to detect both eye centers on images from the BioID database, not including the face detection time.

Databases
BioID database, TalkingFace database, and ColorFeret database are used to evaluate the success rate of the proposed method.
The BioID database consists of 1520 frontal face images with files which contain eye position annotations of the left and right eye. There is a single frontal face at each image, and these images belong to twenty-three different individuals. BioID database is a challenging database because images are gray level and have low resolution (384 × 286). Most of the images belong to the persons who use glasses, and there is an intense reflection in their glasses. There are images taken in areas that are not sufficiently illuminated or over-illuminated. There are images where an individual's eyes are fully closed or partially closed which makes it impossible to determine where eyes are since there is no eye in view. However, to determine the success of an algorithm a challenging database is necessary. All of the images on the BioID database are used in the experiments.
The TalkingFace database consists of 5000 frontal face images of one individual which are taken while the individual is talking in a sufficiently illuminated area. The images are in color and have 720 × 526 resolution. There is a single frontal face at each image and a file for each image that contains coordinates of 68 interest points of the face which include eye positions. All of the images of TalkingFace are used in the experiments.
ColorFeret database is another face database which contains color images to develop, test and evaluate face detection algorithms. Colorferet database contains 11338 facial images of 994 subjects from various angles. The images in the Color FERET Database are 512 by 768 pixels. Regular front images and alternative frontal images are used in this study. However, some of the images did not have eye position information, and usually, eye positions are not accurate.
Image resolution itself may not be adequate in understanding the database. Image resolution and resolution of the faces within images are given in Tab. 1. Mean resolution of the faces for the BioID database is very low. Resolution of the TalkingFace database is about 3.4 times that of the BioID database. However, resolution of the face area of the TalkingFace database is about 5.4 times that of the BioID database.

Evaluation
Error rate e is calculated by using Eq. (10) where e is an error rate, L, R are actual position of the left and the right pupil whereas L ' and R ' are calculated positions by the method and difference of them is the Euclidean distance between them.
If error rate e is less than 0.25, the algorithm is considered good for locating the eye, however, for application such as gaze detection, e should be less than 0.05.
As seen in the images, this approach manages to create a minimal eye region. Also, these eye regions do not contain eyebrows. As long as eye corners are detected at an acceptable error rate, this eye region creation approach is successful in creating a small region which contains only eye area.
In Fig. 9 and Fig. 10, successful results for BioID and TalkingFace databases are given. As seen in the results this method manages to draw a white circle to the eyes and the center of the circle is where eye pupil is. In Fig. 11 and Fig. 12, unsuccessful results for BioID and Talking Face databases are given. In some cases as seen in Fig. 11 and Fig. 12, eye corners are not accurate. In such a case calculated eye radius may be smaller than the real value. Even though the result will be within the eye, it may not be perfectly aligned with the eye. In some other cases, the small partition of the eyes is visible as seen in the top left image in Fig. 12. In such cases; the eye may not be accurately localized.
In Fig. 13 and Fig. 14 unsuccessful results for BioID and TalkingFace databases are given. However, as seen in the images these results were successful. It has been noticed that some of the eye positions are not annotated accurately.

Figure 12
In some cases eye positions of the BioID database are not accurate.
These images are marked as unsuccessful for e < 0.05 Figure 13 Successful localization, which is marked as unsuccessful for Talking Face database (e < 0.025) Our attempt to correct the inaccurate eye positions showed that even the same person might mark different locations as an eye in different trials, which is why it has been decided to use given eye positions. However, it should be noted that for (e < 0.025) some of the successful results will be marked as unsuccessful. ColorFeret database has the best resolution for faces out of the three databases. However, it has also the poorest eye position information. Usually, they merely did mark somewhere within an eye. They did not mark eye centers. In Fig. 15, some successful results from ColorFeret database are given. It can be seen that as the resolution grows the method can cope with it. Fig. 16 shows yet another example of failed eye corner detection for ColorFeret database. As seen in images, eye corners are not accurate which causes unsuccessful results. However, since eye positions are not accurate, ColorFeret database has not been used for success comparison. Experimental results are given in Tab. 2 and Tab. 3. For e < 0.1 at worst 91.91% success rate is achieved in both databases.
As seen in Fig. 6, (h) is the input list where all points within the eye circle are used. Thus the best result is expected from this list and as seen in Tab. 2 for both databases (h) has the best results. (b), (c), and (d) have the lowest score in both databases, and all of these input lists have the points from a circle which has a half radius of the eye. Usually, reflection occurs within this circle and, we believe, that is the reason for the low success rates. (e), (f), and (g) are the lists which have a low point count. In descending order, they are sorted (g), (e), and (f) according to the number of points they contain. The results of the (f) are surprisingly good for TalkingFace database, and it is better than the (h) for e < 0.025. However, for BioID databases result of the (f) is the lowest among them. In a state where illumination is sufficient, resolution is adequate as well; (f) is an excellent choice for locating the eye. (e) was not very successful in both databases in contrast to (g) which manages to achieve good results in both databases. It seems (g) is an acceptable choice for an application which contains faces with different resolutions.

Comparison with the State of the Art
This study is compared with state of the art methods which use the BioID database and Talking Face database to determine the success rate. The results are given in Tab. Easymatch method seems to be average against state of the art for the BioID database. However, it should be noted even though all of the methods use BioID database some of them use the database partially such as removing images with people who use eyeglasses or removing images with closed eyes. Also, some of the studies [6,8,10,11] had used the Viola-Jones face detector to detect the face and according to our tests; successful face detection rate of the Viola-Jones is 89%, which means the success rate of their method shouldn't be more than 89% if they used the Viola-Jones method.

4.
Easymatch method has the best result in (e < 0.025) against state of the art for TalkingFace database. This approach yields better results than low resolutions at decent resolutions. However, this difference is also due to poor environmental conditions of the BioID database.
This approach is faster than other methods. The time consumption is close to or less than the time consumption of image smoothing operations.
Methods which use AI-based approaches achieve better success rates; in fact, they achieve the best success rates overall. However, they also have some shortcomings. For AI-based approaches to work, images must be resized to a certain size, which means additional workload and additional quantization error. If an image has a better resolution, the approaches' success rates tend to drop. For the BioID database, 4 ms average time consumption is calculated in [16] while the approach presented in this paper manages to get less than 1 ms.

CONCLUSION
Eye detection is a fundamental and essential step for many applications. This paper presents a simple matching method to localize eye positions in frontal face images. Three characteristics of the eyes have been used to determine eye location. Those characteristics are: eyes are darker than the surface around of it, there is a ratio between the eye radius and the distance between two eye corners, and finally, eyes are round. Using facial landmarks and a tailored matching method a novel way to detect the eyes is presented. A face detector with 68 points has been used in experiments. However, out of 68 points, only the 12 points which are around the eye have been used in this study. We believe it would provide better results if an eye detector with 12 points were to be created for eye detection. The approach which is shown in this study provides successful results in a fast and easy way. Different reduced input images have been used to detect eye. Each input image has different computation speed and different success rates, which makes it possible to choose between them according to the expectations of the application.
Overall, the method yields promising results, considering it is an easy approach which does not involve any learning or model scheme. Also, the method does not need any complex algorithm to determine eye location; it is easy to understand and easy to implement.