Izvorni znanstveni članak
Building a Croatian language stemmer
Ivan Pandžić
orcid.org/0000-0002-7741-8996
; Institut za hrvatski jezik i jezikoslovlje Ulica Republike Austrije 16, HR-10000 Zagreb
Sažetak
The paper presents two conservative Croatian language stemmers, k2 and k3. These stemmers are based on the k1 stemmer, an aggressive Croatian language stemmer presented by Nikola Ljubešić in a 2007 paper. By introducing an expanded set of rules that use derivational morphemes of nouns, verbs, and adjectives to determine the stems of words, we hoped to create a more efficient
stemmer. In order to test whether the k2 and k3 stemmers were more efficient than the k1 stemmer, we calculated their precision, recall, and F1-score using a 9775 token corpus, and compared the results with the precision, recall, and F1-score of the k1 stemmer.
Ključne riječi
rule-based stemming; computational linguistics; natural language processing; Croatian language
Hrčak ID:
150047
URI
Datum izdavanja:
29.12.2015.
Posjeta: 4.557 *