Annotation Scheme and Evaluation: The Case of Offensive Language

Lewandowska-Tomaszczyk, Barbara; Žitnik, Slavko; Liebeskind, Chaya; Valunaite Oleskevicienė, Giedre; Bączkowska, Anna; Wilson, Paul A.; Trojszczak, Marcin; Brač, Ivana; Filipić, Lobel; Ostroški Anić, Ana; Dontcheva-Navratilova, Olga; Borowiak, Agnieszka; Despot, Kristina; Mitrović, Jelena

doi:10.31724/rihjj.49.1.8

Rasprave Instituta za hrvatski jezik, Vol. 49 No. 1, 2023.

Original scientific paper

https://doi.org/10.31724/rihjj.49.1.8

Annotation Scheme and Evaluation: The Case of Offensive Language

Barbara Lewandowska-Tomaszczyk orcid.org/0000-0002-6836-3321 ; University of Applied Sciences in Konin, Poland *
Slavko Žitnik ; University of Ljubljana, Slovenia
Chaya Liebeskind ; Jerusalem Institute of Technology, Israel
Giedre Valunaite Oleskevicienė orcid.org/0000-0001-5688-2469 ; Mykolas Romeris University, Vilnius, Lithuania
Anna Bączkowska orcid.org/0000-0002-0147-2718 ; University of Gdansk, Poland
Paul A. Wilson ; University of Lodz, Poland
Marcin Trojszczak ; University of Applied Sciences in Konin, Poland
Ivana Brač orcid.org/0000-0002-3660-5285 ; Institute for the Croatian Language, Zagreb
Lobel Filipić ; Institute for the Croatian Language, Zagreb
Ana Ostroški Anić orcid.org/0000-0001-9999-0750 ; Institute for the Croatian Language, Zagreb
Olga Dontcheva-Navratilova orcid.org/0000-0002-0378-7975 ; Masaryk University, Brno, Czech Republic
Agnieszka Borowiak ; University of Humanities and Economics, Lodz, Poland
Kristina Despot orcid.org/0000-0001-9004-5103 ; Institute for the Croatian Language, Zagreb
Jelena Mitrović ; University of Passau, Germany; Institute for AI R&D of Serbia

* Corresponding author.

Full text: english pdf 1.408 Kb

page 155-175

downloads: 1.059

cite

APA 6th Edition

Lewandowska-Tomaszczyk, B., Žitnik, S., Liebeskind, C., Valunaite Oleskevicienė, G., Bączkowska, A., Wilson, P.A., ... Mitrović, J. (2023). Annotation Scheme and Evaluation: The Case of Offensive Language. Rasprave Instituta za hrvatski jezik, 49 (1), 155-175. https://doi.org/10.31724/rihjj.49.1.8

MLA 8th Edition

Lewandowska-Tomaszczyk, Barbara, et al. "Annotation Scheme and Evaluation: The Case of Offensive Language." Rasprave Instituta za hrvatski jezik, vol. 49, no. 1, 2023, pp. 155-175. https://doi.org/10.31724/rihjj.49.1.8. Accessed 30 Jul. 2026.

Chicago 17th Edition

Lewandowska-Tomaszczyk, Barbara, Slavko Žitnik, Chaya Liebeskind, Giedre Valunaite Oleskevicienė, Anna Bączkowska, Paul A. Wilson, Marcin Trojszczak, et al. "Annotation Scheme and Evaluation: The Case of Offensive Language." Rasprave Instituta za hrvatski jezik 49, no. 1 (2023): 155-175. https://doi.org/10.31724/rihjj.49.1.8

Harvard

Lewandowska-Tomaszczyk, B., et al. (2023). 'Annotation Scheme and Evaluation: The Case of Offensive Language', Rasprave Instituta za hrvatski jezik, 49(1), pp. 155-175. https://doi.org/10.31724/rihjj.49.1.8

Vancouver

Lewandowska-Tomaszczyk B, Žitnik S, Liebeskind C, Valunaite Oleskevicienė G, Bączkowska A, Wilson PA, et al. Annotation Scheme and Evaluation: The Case of Offensive Language. Rasprave Instituta za hrvatski jezik [Internet]. 2023 [cited 2026 July 30];49(1):155-175. https://doi.org/10.31724/rihjj.49.1.8

IEEE

B. Lewandowska-Tomaszczyk, et al., "Annotation Scheme and Evaluation: The Case of Offensive Language", Rasprave Instituta za hrvatski jezik, vol.49, no. 1, pp. 155-175, 2023. [Online]. https://doi.org/10.31724/rihjj.49.1.8

Abstract

The present paper focuses on the presentation and discussion of aspects of OFFENSIVE LANGUAGE linguistic annotation, including the creation, annotation practice, curation, and evaluation of an OFFENSIVE LANGUAGE annotation taxonomy scheme, that was first proposed in Lewandowska-Tomaszczyk et al. (2021). An extended offensive language ontology comprising 17 categories, structured in terms of 4 hierarchical levels, has been shown to represent the encoding of the defined offensive language schema, trained in terms of non-contextual word embeddings – i.e., Word2Vec and Fast Text, and eventually juxtaposed to the data acquired by using a pair wise training and testing analysis for existing categories in the HateBERT model (Lewandowska-Tomaszczyk et al. submitted). The study reports on the annotation practice in WG 4.1.1. Incivility in media and social media in the context of COST Action CA 18209 European network for Web-centred linguistic data science (Nexus Linguarum) with the INCEpTION tool (https://github.com/inception-project/inception) – a semantic annotation platform offering assistance in the annotation. The results partly support the proposed ontology of explicit offense and positive implicitness types to provide more variance among widely recognized types of figurative language (e.g., metaphorical, metonymic, ironic, etc.). The use of the annotation system and the representation of linguistic data were also evaluated in a series of the annotators’ comments, by means of a questionnaire and an open discussion. The annotation results and the questionnaire showed that for some of the categories there was low or medium inter-annotator agreement, and it was more challenging for annotators to distinguish between category items than between aspect items, with the category items offensive, insulting and abusive being the most difficult in this respect. The need for taxonomic simplification measures on the basis of these results has been recognized for further annotation practices.