Sex estimation by the patterns of lip impressions (cheiloscopy) – an analysis of a Croatian sample and a scoping review

Methods: The study on the Croatian population included 88 male and 88 female (median age 25, range 18 50) participants. Lip prints were analyzed by quadrant, and then the predominant pattern on the entire lip was observed. A systematic search of the relevant bibliographical databases was conducted, including Medline, Scopus, Web of Science Core Collection (WoSCC), and Cinahl (October 23rd, 2020). OpenGrey, Open Science Framework, and Science. gov databases were searched for grey literature. Findings were reported in the narrative form in accordance with the PRISMA Extension for Scoping Reviews (PRISMA-ScR) checklist. A total of 80 studies were included.


Introduction
Lip prints, similar to fingerprints, are unique, and can be crucial in the process of individualization [1]. As fingerprints are used to estimate other biological characteristics such as sex [2] and height [3], the usefulness of the lip prints has also been widely studied in sex estimation. Literature suggests that use of lip prints can aid in estimating the similarity between child and parents [4,5] and can also be used to examine sexual dimorphism with a variety of results ranging from non-dysmorphic to highly dysmorphic [6,7].
The main critiques of lip print analysis are the lack of widely accepted methodology for trace evidence collection, analysis, and interpretation; lack of data about population and regional diversity; as well as any ageing effects. In the 1999 United States court case People v. Davis, the appeal for a murder conviction concluded that the discipline of lip print analysis did not exist in the scientific community, that there was not any established training or certification, and that the methodology was vague [8].
To be more precise, there is no uniform methodology of studying the lines on the lips, i.e., it is not defined how the sample is taken, how is it analyzed, whether the predominant sample of a particular part of the lip print is taken, whether there is an error within and between observers, etc. So far, neither the repeatability of the method nor the reliability have been determined.
Similar to various other populations, Croatian sample has previously been studied. In that study, lip print differences between sexes were observed, but the sample was relatively small [9]. Keeping that gap in literature in mind, we approached sex estimation using lip print morphology in two ways with the hypothesis that there is no difference in the lip prints of males and females. First, we conducted a study on the Croatian sample to test the methodology and sexual dimorphism of lips. In the second step, we conducted a scoping review of all available studies on sexual dimorphism of lip prints.

Analysis of sexual dimorphism
st-open.unist.hr 3 of collecting the samples, applying lipstick, and lip print exemption was explained to the participants prior to sample collection.

Materials
The materials used in this study were: two types of lipstick (Essence Colour Boost, Vinylicious, Essence, Italy and Catrice Ultimate Colour, 480 Red Side Black, Catrice, Luxembourg), cotton swabs, adhesive tape, white A4 paper, wet wipes, and scanner (Canon image RUNNER ADVANCE C3320, Canon Inc., Tokyo, Japan).
Pilot study -selection of the lip stick color and lip print exemption First, the pilot study was conducted with six participants to select the methods for lip print collection and the color of lipstick that was most visible (MP, HE, SB, MJ, IK, ŽB). Four different methods with two types of lipsticks were used (Table 1) using the data from previous research [1,4,9,10].

st-open.unist.hr
For classification of the lip prints, the observers were blinded (every sample was coded).

Scoping review of the literature
The study was registered on Open Science Network https://osf.io/9ytbh/. Findings were reported in a narrative form in accordance with the PRISMA Extension for Scoping Reviews (PRISMA-ScR) checklist [13]. The two authors (MP and HE) independently extracted the data, and in case of disagreement, they consulted the third author (ŽB) to reach an agreement. Cohen's Kappa (K) was calculated to evaluate the interobserver agreement.

Eligibility criteria
Studies were included if they explicitly investigated sexual dimorphism using lip prints, irrespective of the study design. Editorials, letters, and methodology studies were excluded. This scoping review focused on exploring the accuracy of assessing sexual dimorphism using lip prints.

Search strategy
The first step included an initial selective search of relevant databases and was followed by the analysis of the text words contained in the titles, the abstracts, and the index terms used to describe the articles [14].

Study selection
Screening of the search results was carried out using the EndNote tool (EndNote X9, Clarivate, Philadelphia, PA, 2013). Titles and abstracts of all studies and any full texts were checked for eligibility by two independent reviewers (MP, HE). Both authors needed the reach an agreement for a study to be included in the review. In case of a disagreement, the third reviewer (ŽB) was consulted.

Quality assessment
The quality of included studies was assessed using JBI Critical Appraisal Tools [15]. Two researchers (MP, HE) appraised the studies. As the JBI Critical Appraisal Tools [15] are intended for medical research, some of the validation criteria were adjusted to better fit the forensic sciences (Appendix 2).

Data extraction
Two authors (MP, HE) independently extracted data from the included studies. A data ex-

Summarizing and analyzing data
The two authors (MP and HE) independently extracted the data, and in case of disagreement consulted the third author (ŽB) to reach an agreement. Quantitative pooling of the individual studies' data was not possible due to the high diversity of the included studies; therefore, results were presented descriptively.

Scoping review
The results of the scoping review process are shown in Figure 2. The number of records identified through database searching was 10,642 while the final number of included studies was 80. A list of studies included in the scoping review, along with details regarding their lip print research, is shown in Table 2. Although we initially found 100 papers, 16 were not available as full texts (the authors did not respond to several e-mail requests for full texts) and four studies were excluded as they did not analyze sexual dimorphism, thus leaving 80 studies for analysis ( Table 2).
In the experimental part of the study (on the sample from the Croatian population), it was not possible to pinpoint differences between males and females by analyzing quadrants or the overall predominant appearance of lip prints. Considering the collection methods, our research has shown that the method by which the clearest lip prints are obtained is with a darker lipstick and with a closed and/or partially open mouth. In this study, we did not opt for only one method (closed or partially open mouth) because depending on the shape and/or thickness of the lips, as well as the strength of pressure applied to the paper, both methods were sometimes more appropriate for observing morphological features.
To evaluate the analyzed papers in the scoping review, we used the critical appraisal of research (Appendix 2) and the PRISMA-ScR checklists [15].
To answer the question, Were the criteria for inclusion in the sample clearly defined? we considered a study "unclear" and marked it with one asterisk if there were no defined characteristics of what the authors considered a healthy person (e.g., lips without trauma, damage, etc.) and where only the population and age of participants were defined.
A total of 14 studies were considered "unclear" in this category. The studies marked "unclear" with two asterisks were those where the authors stated that they had excluded any participants with undesirable pathologies, but they did not define them. There were six such studies. The studies marked "unclear" with three asterisks were those that stated that their exclusion criteria were hypersensitivity to cosmetics and lesions on the lips. Two studies were found with these criteria. Six other studies were considered combinations of the above unclear categories, so we could not opt for one classification. Overall, the studies that named all the inclusion and exclusion criteria were marked as "yes"; 45 studies fell into this criterion. Seven of the studies did not meet any inclusion or exclusion criteria. Thus, only approximately half of the studies completely fulfilled these criteria. The question Were confounding factors identified? considered the same criteria as previously described, thus the number of "yes", "no", and "unclear" is the same as in the previous For the question, Were the study subjects and the setting described in detail? we marked a study as "unclear" with one asterisk if the respondents were not well/clearly selected (sex and/or inclusion and exclusion criteria were not defined). The number of these studies was 11. For example, Ragab, A., et al. [72] had 955 respondents and gave the distribution between sexes, but in this study, most of the participants were female (75%). If the place, institution, sex, and/or age of the study setting and participants characteristics were missing, the study was marked as "unclear" with two asterisks. The number of these studies was 11. The number of other studies marked as "unclear" was eight. The number of studies that did not describe the study subjects and setting in detail was eight. For this criterion, more than the half of the studies (42) described the participants and setting in detail.
For the criteria, Was the exposure measured in a valid and reliable way? we marked "yes" only those studies that measured either inter or intraobserver error and reported the results. There were some studies that said that they measured the inter/intra observer error, but they did not report the result, and those were considered as "no". Only ten studies met this criterion. For the question, Were objective, standard criteria used for measurement of the condition? the same criteria as for the previous question was applied; thus, the number of studies that met this criteria is also ten.
The question Were strategies to deal with confounding factors stated? was marked NA for all. The reason does not lie in the quality of studies, but rather in the applicability of the mentioned criterion on non-medical studies. As we could not validate studies by this criterion, we have marked all the studies as NA and did not take it into consideration for summary validation of the studies.
For the question, Were the outcomes measured in a valid and reliable way? all the studies that did not calculate inter and intraobserver error, and those of them that had significant disagreement were marked as "unclear." Only nine studies met the criteria.
For the question, Was appropriate statistical analysis used? we had two criteria for marking studies as "yes". "Yes" with no asterisk were those that used only descriptive statistics, and there were 35 of such studies. Additionally, "yes" with one asterisk included those that calculated the inter and intraobserver error, and the number of these studies was 11.
The studies that tested the reliability of sex estimation using lip prints were marked with two asterisks, and there were two studies that met this criterion. Thirty-two studies were marked as "unclear," as these studies yielded only percentages or both percentages and P values. Overall, only six studies met all the criteria [7,47,52,57,59,77]. But none of these studies that observed differences between males and females tested the reliability or accuracy st-open.unist.hr 29 of sex estimation. In the study of Kapoor, N., et al. [47], 200 people participated, and they found differences in type I in males and type III in females. Moshfeghi, M., et al. [57] did not find differences between sexes among 96 participants. Nagalaxmi, V., et al. [59] found differences in males for type III and females in type I with 60 participants. Priyadharshini, K. I., et al. [7] found differences in all types except type I; the sample consisted of 100 participants. Sandhu et al. [77] tested 1200 participants and did not find differences between sexes. Kinra, M., et al. [52] tested only 40 participants and observed predominance in type I for females and type II for males. As previously stated, a sample size calculator was rarely used, and the only one of these studies that probably met the necessary sample size at 95% confidence interval was the study of Sandhu et al. [77].
The scoping review showed differences in participant and study characteristics and the reliability of sex estimation.  [35]. Two of the studies had no inclusion or exclusion criteria [31,39]; for the others, the exclusion criteria varied, but most were concentrated on the lack of deformities and illness that could affect the lip grooves. Only ten other studies, in addition to our study, tested inter and/or intraobserver variability [25,42,47,52,57,59,73,77,92]. Overall, these studies have a consensus on this variability. Considering the results of the studies, almost an equal number of them showed that there were and were not differences between males and females. Thus, some of the studies confirmed sexual dimorphism -this, for the most part, included only the predominance of a certain pattern in some quadrants in males and females (there were no similarities between the predominance of the quadrant in sexes between males and females).
Some of the studies did not find differences in quadrants between males and females.
Sex estimation: Eighteen studies tested the classification rate accuracy for sex estimation, ranging from the lowest 17.4% for males [49] to forensically high 98.6% for whole samples [86]. Twenty-nine (36.3%) of the studies stated that they did not find differences between males and females, and thirty-four only found differences in some types of furrows and some quadrants (42.5%).

Discussion
The results of both the primary and scoping studies showed that lip prints are not a reliable tool for sex estimation.
The primary study showed that the accuracy of sex estimation was only 55.8%, and it indicated that lip prints should not be used to estimate sex in the Croatian population.
However, the variability in lip print patterns within the same person indicates that lip prints are extremely useful for individualization purposes. Besides sexual dimorphism, this study also tested the methodology proposed by Costa et al. [4]. They used four different methods that differed in the way that the lipstick was smeared on the lips as well as the material on which they left the imprint (paper, adhesive tape). In this study, the researchers chose to leave lip prints without rubbing lips on adhesive tape [4]. adhesive tape fixed on white paper [96].
Furthermore, this study also applied a dual research approach to the review of lip print morphology. In the first step, lips were examined by quadrants which showed certain shortcomings of the existing quadrant method. Namely, the lines and furrows on the lips are not uniform in individual quadrants, so depending on which part of each quadrant is observed, there may be discrepancies in the estimation of pattern type among and within researchers. During the implementation of the research, it was noticed that there are large differences between the lateral, central, and medial parts of each quadrant. Therefore, some studies suggested the division of lips into additional quadrants, i.e., a change in the existing methodology [1]. The research of Costa and co-workers concluded, similar to our study, that further developmental work of the methods is extremely important -from the collection of lip prints to the recording methodology, and probably the proposition of any new methodology [4].
The results of the scoping review showed that the predominance of some types of lip prints in males and females was not unified among research, and that a predominant lip print could not be detected. The predominance of one lip print in one sex cannot even be population specific as many of these studies were performed in India, and the homogeneity of evidence is not present there. The lack of connection between lip prints and sex can emerge from several factor such as: the different inclusion and exclusion criteria and the collection methodology. On the other hand, since inter and intraobserver variability was tested ambiguously, we were unable to conclude if the method was objective or subjective and if the scoring methods should be improved. To be fair, we must mention that the stud-ies that did perform these tests showed good agreement, though we cannot know if these samples were scored by more experienced scorers. The predominance of one type of lip print in males and females is also not uniform and as such does not give us a path to conclude if there is a general predominance of any type of lip print in either sex.
The examined published research papers, as well as the present study, showed that there are several issues that probably contribute to the (un)reliability of results. First, inclusion criteria were usually not uniform; for example, some of the studies just mentioned that they had included healthy individuals, some listed detailed inclusion and exclusion criteria [18,25,33,38,42], and some gave more detailed exclusion criteria, such as no smoking or lip chewing habits [57]. We cannot be sure if these participants were also excluded in other studies, but probably not all the studies took into consideration all of these parameters. The other issue is sample size and stratification; for most of the studies, the sample size was not calculated, and the distribution of participants regarding sex and age was either small or not proportional. Any sample size calculator results used were vague or non-existent; most of the samples were convenient and not representative of the population. Only five studies used a sample size calculator [35,45,55,76,77]. As lip print analysis is morphological in nature, it is by definition subjective and dependent on the experience of the researcher, thus the intra and interobserver variability should be tested. Only a minor number of studies tested this variable [25,42,47,52,57,59,73,77,92]. In most of these tested studies, the agreement was good or higher; nevertheless, we cannot extrapolate that to the other studies. Although a similar methodology was used to collect and analyze the samples, the number of parts that the lips were divided into varied from one part (whole lip) to various combinations of parts. Thus, some of the results were reported as the predominance of the pattern on the whole lip and some only for one lip part (for example, one quarter). Additionally, in studies where sexual dimorphism was found, it was found usually on one lip part that was not consistent among the studies. As the list of papers consisted mostly of studies from India (62 studies, 77.5%), when analyzing only those samples we did not find population specificity or homogeneity in the distribution of the patterns of lip prints.
The presentation of results was also not uniform. Some studies only reported the frequencies of lip patterns while some gave other descriptive statistics but rarely included the accuracy of sex estimation, which is, as previously explained, the most important parameter in a forensic context. The result of such an unstandardized approach was a large difference between studies ranging from highly dysmorphic lip features to a complete lack of sexual dimorphism. The biggest flaw in most of this research is that they did not offer the accuracy of sex estimation. This information is extremely important for criminal cases, that is, for expert witness testimony. The accuracy of sex estimation is -among the repeatability of methodology, the scientific recognition of methodology, and the existence of validation studies -one of the most important considerations when presenting evidence in court as it can give a judge/jury important information about the accuracy of the findings [97,98]. Unfortunately, most of the studies presented here did not meet most of these criteria. Here, we could not prove that there was a scientific consensus in any part of the collection or analysis processes, thus studies were reporting diametrically different results. On the other hand, lip prints have shown large variability, and they could probably be used for individualization. At this time, there is not enough scientific evidence that lip prints could be a reliable tool for sex estimation with the existing approach. Future research should harmonize and evaluate the methodology and only then investigate sexual and population differences of lip prints.
Limitation of the scoping review: The main limitation of this study is that we could not perform deeper data analysis due to differences in study setup and the fact that some of the initially included studies were not available as full texts. There were also various sources of possible bias in these studies which could not be systematically appraised as the reporting of the results and methodologies were not consistently written throughout the studies. It is likely that most of the studies had selection bias, especially when taking into consideration that most of them had a convenience sample. Also, the authors usually did not specify if the researchers were blinded, so there is another potential source of observer bias. There is also an unknown possibility of detection bias. We do not know the researchers' experience in scoring methods (there is no training [8]) since the interrater variability was vaguely tested, if at all.
Novelty of the study: This is the first scoping review made on the criminalistics topic of the sexual dimorphism of lip prints, and it showed the necessity of research in this field.
Recommendations: At this point, we believe that a first step should be a design of the methodological standards for the collection of lip prints and improvement of the scoring methodology. The scoring system should test the subjectivity of the morphological method and give a more detailed explanation about which lip segments and parts should be used.
If future research shows that there is a good inter and intraobserver agreement regarding lip morphology, only then should sexual dimorphism be tested. If the lips show sexual dimorphism in several populations (that are well sampled and representative), and if that dimorphism would have a forensic significance (classification rate higher than 95%), then lip prints can be used for sex estimation in criminal procedures.

Conclusions
1. There is no sexual dimorphism in lip prints in the Croatian population.
2. The scoping review showed that the previous studies lack methodology uniformity in collection, lip print gathering, visualization, and interpretation.
3. The scoping review showed that the present methodologies are not reliable.

4.
The scoping review showed that the potential rate of error is unknown.

5.
Lip prints for sex estimation using available methodologies should not be used as evidence in court.