Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.31820/f.37.2.4

CroSloMet: A Structured Metaphor Dataset for Croatian and Slovene

Kristina Štrkalj Despot orcid id orcid.org/0000-0001-9004-5103 ; Institute of Croatian Language, Zagreb
Ana Ostroški Anić orcid id orcid.org/0000-0001-9999-0750 ; Institute of Croatian Language
Polona Gantar orcid id orcid.org/0000-0001-5822-6414 ; University of Ljubljana
Mija Bon ; University of Ljubljana
Matej Klemen orcid id orcid.org/0000-0002-7852-2357 ; University of Ljubljana
Marko Robnik Šikonja orcid id orcid.org/0000-0002-1232-3320 ; University of Ljubljana
Simon Krek orcid id orcid.org/0000-0001-8965-6863 ; University of Ljubljana
Benedikt Perak ; University of Rijeka
Jaka Čibej orcid id orcid.org/0000-0002-3037-6848 ; University of Ljubljana


Puni tekst: engleski pdf 636 Kb

str. 459-482

preuzimanja: 241

citiraj


Sažetak

Recent advancements in large language models (LLMs) have opened new avenues for processing figurative language, yet their performance in metaphor interpretation continues to fall short of human-level understanding. One limitation lies in the inadequacy of existing metaphor datasets, which often lack explicit connections to conceptual metaphors and are predominantly monolingual. In this paper, we present CroSloMet, a novel dataset of over 1,120 metaphorical and 1,120 literal sentences in Croatian and Slovene, grounded in the MetaNet.HR framework. Each example is annotated with the corresponding conceptual metaphor, linguistic multi-word expression (MWE), canonical forms, and literal usage, enabling both metaphor identification and explanation tasks. We present preliminary evaluations of the dataset through two experiments: metaphor classification using CroSloEngual BERT, achieving 88.5% accuracy, and metaphor explanation generation with LLama 3-8B, where strict exact-match evaluation yielded low scores despite semantically valid outputs. To address this, we propose a multi-level validation framework combining manual annotation, natural language inference, semantic similarity, and LLM-based judgment. Our findings highlight the importance of capturing generality and specificity in metaphor mappings and call for more nuanced evaluation methods. CroSloMet provides a resource for advancing metaphor understanding in LLMs and contributes to cross-linguistic and cognitively informed metaphor research.

Ključne riječi

metaphors; metaphor dataset; metaphor explanation; metaphor understanding; large language models

Hrčak ID:

342892

URI

https://hrcak.srce.hr/342892

Datum izdavanja:

31.12.2025.

Podaci na drugim jezicima: hrvatski

Posjeta: 687 *