Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.31724/rihjj.48.1.2

Naïve Terminological Annotation of Legal Texts in Slovak – Can it Be Useful?

Radovan Garabík orcid id orcid.org/0000-0003-1691-3157 ; Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences
Jana Levická ; Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences


Puni tekst: engleski pdf 1.030 Kb

str. 27-44

preuzimanja: 343

citiraj


Sažetak

Correct automatic terminological annotation of texts in a corpus can be sometimes a challenging task, especially for moderately or heavily inflected languages with relatively free word order. We explore the possibility of simple annotation based on sequence matching of lemmatized texts to annotate Slovak language corpus with IATE terminological entries. The accuracy of annotating legal language is very good when annotating multiword terms, while accuracy of single-word terms can be increased by applying simple filters based on word lengths and blacklisting most frequent false positives.

Ključne riječi

terminology; corpus; Slovak language; corpus annotation; IATE

Hrčak ID:

281027

URI

https://hrcak.srce.hr/281027

Datum izdavanja:

29.7.2022.

Podaci na drugim jezicima: hrvatski

Posjeta: 1.202 *