An Empirical Study on Document Similarity Comparison Evaluation Between Machine Learning Techniques and Human Experts

Jang, Won-Jung

doi:10.17559/TV-20231011001013

Technical gazette, Vol. 31 No. 5, 2024.

Original scientific paper

https://doi.org/10.17559/TV-20231011001013

An Empirical Study on Document Similarity Comparison Evaluation Between Machine Learning Techniques and Human Experts

Won-Jung Jang ; Catholic Kwandong University, 25601 #502, The Mary Hall, 24, Beomil-ro 576, Gangneung-si, Gangwon-do, South Korea *

* Corresponding author.

Full text: english pdf 1.848 Kb

page 1668-1679

downloads: 433

cite

APA 6th Edition

Jang, W. (2024). An Empirical Study on Document Similarity Comparison Evaluation Between Machine Learning Techniques and Human Experts. Tehnički vjesnik, 31 (5), 1668-1679. https://doi.org/10.17559/TV-20231011001013

MLA 8th Edition

Jang, Won-Jung. "An Empirical Study on Document Similarity Comparison Evaluation Between Machine Learning Techniques and Human Experts." Tehnički vjesnik, vol. 31, no. 5, 2024, pp. 1668-1679. https://doi.org/10.17559/TV-20231011001013. Accessed 1 Jul. 2026.

Chicago 17th Edition

Jang, Won-Jung. "An Empirical Study on Document Similarity Comparison Evaluation Between Machine Learning Techniques and Human Experts." Tehnički vjesnik 31, no. 5 (2024): 1668-1679. https://doi.org/10.17559/TV-20231011001013

Harvard

Jang, W. (2024). 'An Empirical Study on Document Similarity Comparison Evaluation Between Machine Learning Techniques and Human Experts', Tehnički vjesnik, 31(5), pp. 1668-1679. https://doi.org/10.17559/TV-20231011001013

Vancouver

Jang W. An Empirical Study on Document Similarity Comparison Evaluation Between Machine Learning Techniques and Human Experts. Tehnički vjesnik [Internet]. 2024 [cited 2026 July 01];31(5):1668-1679. https://doi.org/10.17559/TV-20231011001013

IEEE

W. Jang, "An Empirical Study on Document Similarity Comparison Evaluation Between Machine Learning Techniques and Human Experts", Tehnički vjesnik, vol.31, no. 5, pp. 1668-1679, 2024. [Online]. https://doi.org/10.17559/TV-20231011001013

Abstract

Current machine-learning training focuses solely on accuracy. In this study, the weights of other dimensions were examined rather than measuring only the accuracy of machine learning. By comparatively analyzing the decision-making of machine learning and humans in various fields, this study examines how well organizational vision is propagated to lower levels of the organization. Also, the results evaluated by humans and machine learning models were comparatively analyzed from multiple perspectives. As numerical representation methods of words, count-based models (Bag of Words, TF-IDF), artificial neural network (ANN) models (Word2Vec, GloVe), and a vision propagation measurement (VPMS) model combining two methods were used to calculate the similarity between documents, which are comparatively analyzed with the actual results measured by an expert group. The findings of this study can be used as an evaluation metric for how effectively the vision of the upper organization is being disseminated to the lower-level organizations. Additionally, it could be utilized in developing algorithms such as customer segmentation for target marketing using text data.The study makes two key contributions - (i) providing an extensive empirical comparison of document similarity analysis by different ML techniques versus human experts, and (ii) proposing a new VPMS model that outperforms existing methods.

Keywords

ANN model; count-based model; document similarity; ensemble learning model; machine learning

Hrčak ID:

320403

URI

https://hrcak.srce.hr/320403

Publication date:

31.8.2024.

Visits: 1.100 *

Login and registration

Technical gazette, Vol. 31 No. 5, 2024.

Abstract

Keywords

Hrčak ID:

URI

Publication date: