A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings

Babić, Karlo; Guerra, Francesco; Martinčić-Ipšić, Sanda; Meštrović, Ana

doi:10.31341/jios.44.2.2

Journal of Information and Organizational Sciences, Vol. 44 No. 2, 2020.

Original scientific paper

https://doi.org/10.31341/jios.44.2.2

A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings

Karlo Babić orcid.org/0000-0001-6343-0938 ; Department of Informatics, Center for Artificial Intelligence and Cybersecurity, University of Rijeka, Rijeka, Croatia
Francesco Guerra ; Department of Engineering Enzo Ferrari, University of Modena and Reggio Emilia, Modena, Italy
Sanda Martinčić-Ipšić orcid.org/0000-0002-1900-5333 ; Department of Informatics, Center for Artificial Intelligence and Cybersecurity, University of Rijeka, Rijeka, Croatia
Ana Meštrović ; Department of Informatics, Center for Artificial Intelligence and Cybersecurity, University of Rijeka, Rijeka, Croatia

Full text: english pdf 951 Kb

page 231-246

downloads: 1.027

cite

APA 6th Edition

Babić, K., Guerra, F., Martinčić-Ipšić, S. & Meštrović, A. (2020). A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings. Journal of Information and Organizational Sciences, 44 (2), 231-246. https://doi.org/10.31341/jios.44.2.2

MLA 8th Edition

Babić, Karlo, et al. "A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings." Journal of Information and Organizational Sciences, vol. 44, no. 2, 2020, pp. 231-246. https://doi.org/10.31341/jios.44.2.2. Accessed 23 Jun. 2026.

Chicago 17th Edition

Babić, Karlo, Francesco Guerra, Sanda Martinčić-Ipšić and Ana Meštrović. "A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings." Journal of Information and Organizational Sciences 44, no. 2 (2020): 231-246. https://doi.org/10.31341/jios.44.2.2

Harvard

Babić, K., et al. (2020). 'A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings', Journal of Information and Organizational Sciences, 44(2), pp. 231-246. https://doi.org/10.31341/jios.44.2.2

Vancouver

Babić K, Guerra F, Martinčić-Ipšić S, Meštrović A. A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings. Journal of Information and Organizational Sciences [Internet]. 2020 [cited 2026 June 23];44(2):231-246. https://doi.org/10.31341/jios.44.2.2

IEEE

K. Babić, F. Guerra, S. Martinčić-Ipšić and A. Meštrović, "A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings", Journal of Information and Organizational Sciences, vol.44, no. 2, pp. 231-246, 2020. [Online]. https://doi.org/10.31341/jios.44.2.2

Abstract

Measuring the semantic similarity of texts has a vital role in various tasks from the field of natural language processing. In this paper, we describe a set of experiments we carried out to evaluate and compare the performance of different approaches for measuring the semantic similarity of short texts. We perform a comparison of four models based on word embeddings: two variants of Word2Vec (one based on Word2Vec trained on a specific dataset and the second extending it with embeddings of word senses), FastText, and TF-IDF. Since these models provide word vectors, we experiment with various methods that calculate the semantic similarity of short texts based on word vectors. More precisely, for each of these models, we test five methods for aggregating word embeddings into text embedding. We introduced three methods by making variations of two commonly used similarity measures. One method is an extension of the cosine similarity based on centroids, and the other two methods are variations of the Okapi BM25 function. We evaluate all approaches on the two publicly available datasets: SICK and Lee in terms of the Pearson and Spearman correlation. The results indicate that extended methods perform better from the original in most of the cases.

Keywords

Hrčak ID:

247489

URI

https://hrcak.srce.hr/247489

Publication date:

9.12.2020.

Visits: 2.888 *

Login and registration

Journal of Information and Organizational Sciences, Vol. 44 No. 2, 2020.

Abstract

Keywords

Hrčak ID:

URI

Publication date: