Skip to the main content

Original scientific paper

https://doi.org/10.31724/rihjj.48.1.2

Naïve Terminological Annotation of Legal Texts in Slovak – Can it Be Useful?

Radovan Garabík orcid id orcid.org/0000-0003-1691-3157 ; Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences
Jana Levická ; Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences


Full text: english pdf 1.030 Kb

page 27-44

downloads: 224

cite


Abstract

Correct automatic terminological annotation of texts in a corpus can be sometimes a challenging task, especially for moderately or heavily inflected languages with relatively free word order. We explore the possibility of simple annotation based on sequence matching of lemmatized texts to annotate Slovak language corpus with IATE terminological entries. The accuracy of annotating legal language is very good when annotating multiword terms, while accuracy of single-word terms can be increased by applying simple filters based on word lengths and blacklisting most frequent false positives.

Keywords

terminology; corpus; Slovak language; corpus annotation; IATE

Hrčak ID:

281027

URI

https://hrcak.srce.hr/281027

Publication date:

29.7.2022.

Article data in other languages: croatian

Visits: 712 *