Application of NLP Technologies to Low-Resource Croatian Dialects

Polanec, Maja; Bagić Babac, Marina

doi:10.2478/crdj-2025-0008

Croatian Regional Development Journal, Vol. 6 No. 2, 2025.

Izvorni znanstveni članak

https://doi.org/10.2478/crdj-2025-0008

Application of NLP Technologies to Low-Resource Croatian Dialects

Maja Polanec ; Sveučilište u Zagrebu, Fakultet elektrotehnike i računarstva
Marina Bagić Babac orcid.org/0000-0003-4979-2216 ; Sveučilište u Zagrebu, Fakultet elektrotehnike i računarstva *

* Dopisni autor.

Puni tekst: engleski pdf 282 Kb

str. 13-23

preuzimanja: 311

citiraj

APA 6th Edition

Polanec, M. i Bagić Babac, M. (2025). Application of NLP Technologies to Low-Resource Croatian Dialects. Croatian Regional Development Journal, 6 (2), 13-23. https://doi.org/10.2478/crdj-2025-0008

MLA 8th Edition

Polanec, Maja i Marina Bagić Babac. "Application of NLP Technologies to Low-Resource Croatian Dialects." Croatian Regional Development Journal, vol. 6, br. 2, 2025, str. 13-23. https://doi.org/10.2478/crdj-2025-0008. Citirano 17.07.2026.

Chicago 17th Edition

Polanec, Maja i Marina Bagić Babac. "Application of NLP Technologies to Low-Resource Croatian Dialects." Croatian Regional Development Journal 6, br. 2 (2025): 13-23. https://doi.org/10.2478/crdj-2025-0008

Harvard

Polanec, M., i Bagić Babac, M. (2025). 'Application of NLP Technologies to Low-Resource Croatian Dialects', Croatian Regional Development Journal, 6(2), str. 13-23. https://doi.org/10.2478/crdj-2025-0008

Vancouver

Polanec M, Bagić Babac M. Application of NLP Technologies to Low-Resource Croatian Dialects. Croatian Regional Development Journal [Internet]. 2025 [pristupljeno 17.07.2026.];6(2):13-23. https://doi.org/10.2478/crdj-2025-0008

IEEE

M. Polanec i M. Bagić Babac, "Application of NLP Technologies to Low-Resource Croatian Dialects", Croatian Regional Development Journal, vol.6, br. 2, str. 13-23, 2025. [Online]. https://doi.org/10.2478/crdj-2025-0008

Puni tekst: hrvatski pdf 289 Kb

str. 13-23

preuzimanja: 140

citiraj

APA 6th Edition

Polanec, M. i Bagić Babac, M. (2025). Application of NLP Technologies to Low-Resource Croatian Dialects. Croatian Regional Development Journal, 6 (2), 13-23. https://doi.org/10.2478/crdj-2025-0008

MLA 8th Edition

Chicago 17th Edition

Harvard

Vancouver

IEEE

Sažetak

In natural language processing (NLP) systems, a trend of decreased performance is observed when applied to texts written in low-resource dialects rather than the standard language. Dependency parsing is an essential component in NLP systems, and therefore, its improvement could lead to enhanced overall system performance. This paper aims to compare the performance of Slovenian and Croatian parsers for dependency parsing of the Kajkavian dialect. The comparison results will provide insight into the Slovenian parser's potential for parsing Kajkavian. A dependency parsing dataset was created using parallel translations of the book „Mali kraljević“. Based on the created dataset, label projection from the parsed standard Croatian language to the Kajkavian dialect was performed to obtain data for calculating UAS and LAS metrics for comparing the Croatian and Slovenian parsers, which were implemented using the open-source SpaCy library. The Croatian parser achieved UAS and LAS scores of 0.47 and 0.30, respectively, which are lower than those of the Slovenian parser (0.52 and 0.34, respectively). The results indicate that the Slovenian parser performs more accurately on the Kajkavian dialect. However, to draw a general conclusion, the dataset would need to be expanded.

Ključne riječi

Natural Language Processing (NLP); low-resource dialect; Croatian language; dependency parser

Hrčak ID:

341539

URI

https://hrcak.srce.hr/341539

Datum izdavanja:

20.12.2025.

Podaci na drugim jezicima: hrvatski

Posjeta: 817 *

Prijava i registracija

Croatian Regional Development Journal, Vol. 6 No. 2, 2025.

Sažetak

Ključne riječi

Hrčak ID:

URI

Datum izdavanja: