Original scientific paper
https://doi.org/10.31820/f.37.2.4
CroSloMet: A Structured Metaphor Dataset for Croatian and Slovene
Kristina Štrkalj Despot
orcid.org/0000-0001-9004-5103
; Institute of Croatian Language, Zagreb
Ana Ostroški Anić
orcid.org/0000-0001-9999-0750
; Institute of Croatian Language
Polona Gantar
orcid.org/0000-0001-5822-6414
; University of Ljubljana
Mija Bon
; University of Ljubljana
Matej Klemen
orcid.org/0000-0002-7852-2357
; University of Ljubljana
Marko Robnik Šikonja
orcid.org/0000-0002-1232-3320
; University of Ljubljana
Simon Krek
orcid.org/0000-0001-8965-6863
; University of Ljubljana
Benedikt Perak
; University of Rijeka
Jaka Čibej
orcid.org/0000-0002-3037-6848
; University of Ljubljana
Abstract
Recent advancements in large language models (LLMs) have opened new avenues for processing figurative language, yet their performance in metaphor interpretation continues to fall short of human-level understanding. One limitation lies in the inadequacy of existing metaphor datasets, which often lack explicit connections to conceptual metaphors and are predominantly monolingual. In this paper, we present CroSloMet, a novel dataset of over 1,120 metaphorical and 1,120 literal sentences in Croatian and Slovene, grounded in the MetaNet.HR framework. Each example is annotated with the corresponding conceptual metaphor, linguistic multi-word expression (MWE), canonical forms, and literal usage, enabling both metaphor identification and explanation tasks. We present preliminary evaluations of the dataset through two experiments: metaphor classification using CroSloEngual BERT, achieving 88.5% accuracy, and metaphor explanation generation with LLama 3-8B, where strict exact-match evaluation yielded low scores despite semantically valid outputs. To address this, we propose a multi-level validation framework combining manual annotation, natural language inference, semantic similarity, and LLM-based judgment. Our findings highlight the importance of capturing generality and specificity in metaphor mappings and call for more nuanced evaluation methods. CroSloMet provides a resource for advancing metaphor understanding in LLMs and contributes to cross-linguistic and cognitively informed metaphor research.
Keywords
metaphors; metaphor dataset; metaphor explanation; metaphor understanding; large language models
Hrčak ID:
342892
URI
Publication date:
31.12.2025.
Visits: 687 *