Building Croatian Medical Dictionary from Medical Corpus

Kocijan, Kristina; Kurolt, Silvia; Mijić, Linda

doi:10.31724/rihjj.46.2.17

Rasprave Instituta za hrvatski jezik, Vol. 46 No. 2, 2020.

Preliminary communication

https://doi.org/10.31724/rihjj.46.2.17

Building Croatian Medical Dictionary from Medical Corpus

Kristina Kocijan orcid.org/0000-0001-9467-5313 ; Faculty of Humanities and Social Sciences, University of Zagreb
Silvia Kurolt ; Faculty of Humanities and Social Sciences, University of Zagreb
Linda Mijić orcid.org/0000-0003-3246-7652 ; Department of Classical Philology, University of Zadar

Full text: english pdf 1.454 Kb

page 765-782

downloads: 1.309

cite

APA 6th Edition

Kocijan, K., Kurolt, S. & Mijić, L. (2020). Building Croatian Medical Dictionary from Medical Corpus. Rasprave Instituta za hrvatski jezik, 46 (2), 765-782. https://doi.org/10.31724/rihjj.46.2.17

MLA 8th Edition

Kocijan, Kristina, et al. "Building Croatian Medical Dictionary from Medical Corpus." Rasprave Instituta za hrvatski jezik, vol. 46, no. 2, 2020, pp. 765-782. https://doi.org/10.31724/rihjj.46.2.17. Accessed 18 Jul. 2026.

Chicago 17th Edition

Kocijan, Kristina, Silvia Kurolt and Linda Mijić. "Building Croatian Medical Dictionary from Medical Corpus." Rasprave Instituta za hrvatski jezik 46, no. 2 (2020): 765-782. https://doi.org/10.31724/rihjj.46.2.17

Harvard

Kocijan, K., Kurolt, S., and Mijić, L. (2020). 'Building Croatian Medical Dictionary from Medical Corpus', Rasprave Instituta za hrvatski jezik, 46(2), pp. 765-782. https://doi.org/10.31724/rihjj.46.2.17

Vancouver

Kocijan K, Kurolt S, Mijić L. Building Croatian Medical Dictionary from Medical Corpus. Rasprave Instituta za hrvatski jezik [Internet]. 2020 [cited 2026 July 18];46(2):765-782. https://doi.org/10.31724/rihjj.46.2.17

IEEE

K. Kocijan, S. Kurolt and L. Mijić, "Building Croatian Medical Dictionary from Medical Corpus", Rasprave Instituta za hrvatski jezik, vol.46, no. 2, pp. 765-782, 2020. [Online]. https://doi.org/10.31724/rihjj.46.2.17

Abstract

The overall objective of this project is to define linguistic models at the lexical and syntactic levels that appear in the health domain, depending on the type of corpus. In the first phase of the project, the texts forming the medical corpus A – MedCorA (2,232 pharmaceutical instructions for medicaments available in Croatia) were prepared. The terminology found in this corpus was analyzed and the semantic subdomains (anatomy, condition, microorganism, chemistry, etc.) within the medical domain were defined and added to the dictionary entries. These dictionary resources were used as the foundation for the second phase in which NooJ morphological grammars were built allowing annotation of medical terminology in the corpus. Said grammars were built to allow for recognizing Latinisms, as well as Latin expressions written with Croatian case endings, not only Croatian words. Prepared resources are made available to a broader scientific community via Sketch Engine for further research in the field of medicine enabling additional research and development of algorithms for, among others, medical documents classification, medical texts’ information retrieval or machine translation of medical documentation, taking into account quality and reliability as well as terminology variability.

Keywords

language processing; semantic annotations; medical domain; NooJ; Croatian

Hrčak ID:

245468

URI

https://hrcak.srce.hr/245468

Publication date:

30.10.2020.

Article data in other languages: croatian

Visits: 4.130 *

Login and registration

Rasprave Instituta za hrvatski jezik, Vol. 46 No. 2, 2020.

Abstract

Keywords

Hrčak ID:

URI

Publication date: