Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.1080/00051144.2021.1928437

Representing word meaning in context via lexical substitutes

Domagoj Alagić ; Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
Jan Šnajder ; Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia


Puni tekst: engleski pdf 1.326 Kb

str. 239-248

preuzimanja: 153

citiraj


Sažetak

Representing the meaning of individual words is crucial for most natural language processing (NLP) tasks. This, however, is a challenge because word meaning often depends on the context. Recent approaches to representing word meaning in context rely on lexical substitution (LS), where a word is represented with a set of meaning-preserving substitutes. While face valid, it is not clear to what extent substitute-based representation corresponds to the more established sense-based representation required for many NLP tasks. We present an empirical study that addresses this question by quantifying the correspondence between substitute- and sense-based meaning representations. We compile a high-quality dataset annotated with lexical substitutes and sense labels from two well-established sense inventories, and conduct a correlation analysis using a number of substitute-based similarity measures. Furthermore, as recent work has demonstrated the efficacy of system-produced substitutes for word meaning representation, we compare human- and system-produced substitutes to determine the performance gap between the two. Lastly, we investigate to what extent the results translate to the fundamental semantic task of word sense induction (WSI). Our experiments show the validity of LS for word meaning in context representation and justify the use of system-produced substitutes for WSI.

Ključne riječi

Natural language processing; machine learning; lexical substitution; word meaning in context; word sense induction

Hrčak ID:

269830

URI

https://hrcak.srce.hr/269830

Datum izdavanja:

4.6.2021.

Posjeta: 457 *