Izvorni znanstveni članak
https://doi.org/10.2478/crdj-2025-0008
Application of NLP Technologies to Low-Resource Croatian Dialects
Maja Polanec
; Sveučilište u Zagrebu, Fakultet elektrotehnike i računarstva
Marina Bagić Babac
orcid.org/0000-0003-4979-2216
; Sveučilište u Zagrebu, Fakultet elektrotehnike i računarstva
*
* Dopisni autor.
Sažetak
In natural language processing (NLP) systems, a trend of decreased performance is observed when applied to texts written in low-resource dialects rather than the standard language. Dependency parsing is an essential component in NLP systems, and therefore, its improvement could lead to enhanced overall system performance. This paper aims to compare the performance of Slovenian and Croatian parsers for dependency parsing of the Kajkavian dialect. The comparison results will provide insight into the Slovenian parser's potential for parsing Kajkavian. A dependency parsing dataset was created using parallel translations of the book „Mali kraljević“. Based on the created dataset, label projection from the parsed standard Croatian language to the Kajkavian dialect was performed to obtain data for calculating UAS and LAS metrics for comparing the Croatian and Slovenian parsers, which were implemented using the open-source SpaCy library. The Croatian parser achieved UAS and LAS scores of 0.47 and 0.30, respectively, which are lower than those of the Slovenian parser (0.52 and 0.34, respectively). The results indicate that the Slovenian parser performs more accurately on the Kajkavian dialect. However, to draw a general conclusion, the dataset would need to be expanded.
Ključne riječi
Natural Language Processing (NLP); low-resource dialect; Croatian language; dependency parser
Hrčak ID:
341539
URI
Datum izdavanja:
20.12.2025.
Posjeta: 560 *