hrcak mascot   Srce   HID

Izvorni znanstveni članak
https://doi.org/10.2498/cit.1002190

Domain-aware Evaluation of Named Entity Recognition Systems for Croatian

Zeljko Agic ; University of Zagreb
Bozo Bekavac ; Department of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb

Puni tekst: engleski, pdf (439 KB) str. 195-209 preuzimanja: 436* citiraj
APA 6th Edition
Agic, Z. i Bekavac, B. (2013). Domain-aware Evaluation of Named Entity Recognition Systems for Croatian. Journal of computing and information technology, 21 (3), 195-209. https://doi.org/10.2498/cit.1002190
MLA 8th Edition
Agic, Zeljko i Bozo Bekavac. "Domain-aware Evaluation of Named Entity Recognition Systems for Croatian." Journal of computing and information technology, vol. 21, br. 3, 2013, str. 195-209. https://doi.org/10.2498/cit.1002190. Citirano 04.03.2021.
Chicago 17th Edition
Agic, Zeljko i Bozo Bekavac. "Domain-aware Evaluation of Named Entity Recognition Systems for Croatian." Journal of computing and information technology 21, br. 3 (2013): 195-209. https://doi.org/10.2498/cit.1002190
Harvard
Agic, Z., i Bekavac, B. (2013). 'Domain-aware Evaluation of Named Entity Recognition Systems for Croatian', Journal of computing and information technology, 21(3), str. 195-209. https://doi.org/10.2498/cit.1002190
Vancouver
Agic Z, Bekavac B. Domain-aware Evaluation of Named Entity Recognition Systems for Croatian. Journal of computing and information technology [Internet]. 2013 [pristupljeno 04.03.2021.];21(3):195-209. https://doi.org/10.2498/cit.1002190
IEEE
Z. Agic i B. Bekavac, "Domain-aware Evaluation of Named Entity Recognition Systems for Croatian", Journal of computing and information technology, vol.21, br. 3, str. 195-209, 2013. [Online]. https://doi.org/10.2498/cit.1002190

Sažetak
We provide an evaluation of the currently available named entity recognition systems for Croatian. The evaluation puts special emphasis on domain dependence. To this goal, we manually annotated a dataset of approximately 1 million tokens of Croatian text from various domains within the newspaper text genre. The dataset was annotated using a three-class named entity tagset – denoting personal names, locations and organizations. We give insight to feature selection, domain sensitivity and effects of increase in training set size for statistical named entity recognition using the state-of-the-art Stanford NER system. We also sketch a comparison of publicly available named entity recognition systems for Croatian considering domain dependence, regardless of their underlying paradigms. Our top-performing system achieved an F1-score of 0.884 in a mixed-domain testing scenario, scoring 0.925 and 0.843 in the two domains separated for the experiment. The system shows consistency in state-of-the-art scores for detecting names of persons, locations and organizations.

Ključne riječi
named entity recognition; Croatian language; text domain; domain dependence; evaluation

Projekti
EC / FP7 / 288342 / X-LIKE - Cross-lingual Knowledge Extraction

Hrčak ID: 110027

URI
https://hrcak.srce.hr/110027

Posjeta: 647 *