Skoči na glavni sadržaj

Izvorni znanstveni članak

Domain-aware Evaluation of Named Entity Recognition Systems for Croatian

Zeljko Agic ; University of Zagreb
Bozo Bekavac ; Department of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb

Puni tekst: engleski pdf 439 Kb

str. 195-209

preuzimanja: 691



We provide an evaluation of the currently available named entity recognition systems for Croatian. The evaluation puts special emphasis on domain dependence. To this goal, we manually annotated a dataset of approximately 1 million tokens of Croatian text from various domains within the newspaper text genre. The dataset was annotated using a three-class named entity tagset – denoting personal names, locations and organizations. We give insight to feature selection, domain sensitivity and effects of increase in training set size for statistical named entity recognition using the state-of-the-art Stanford NER system. We also sketch a comparison of publicly available named entity recognition systems for Croatian considering domain dependence, regardless of their underlying paradigms. Our top-performing system achieved an F1-score of 0.884 in a mixed-domain testing scenario, scoring 0.925 and 0.843 in the two domains separated for the experiment. The system shows consistency in state-of-the-art scores for detecting names of persons, locations and organizations.

Ključne riječi

named entity recognition; Croatian language; text domain; domain dependence; evaluation

Hrčak ID:



Datum izdavanja:


Posjeta: 1.330 *