Original scientific paper
https://doi.org/10.20532/cit.2022.1005566
Research on Keywords Variations in Linguistics Based on TF-IDF and N-gram
Yuyao Li
; School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China
*
Xueyi Wen
orcid.org/0009-0001-2771-819X
; The Institute of Corpus Studies and Applications, Shanghai International Studies University, Shanghai, China
Xingyu Liu
; Faculty of English Language and Culture, Guangdong University of Foreign Studies, Guangzhou, China
* Corresponding author.
Abstract
The rapid development of natural language processing (NLP) holds great promise for bridging the divide among languages. One of its main innovative applications is to use broad data to explore the historical trend of a subject. However, since Saussure pioneered modern linguistics, there is relatively inadequate research work done in the linguistic research on the field's variations to comprehensively reveal the linguistic trends. To trace the changes in linguistic research hotspots, we use a dataset of more than 30,000 linguistics-related literature with their titles from the Web of Science and apply NLP techniques to the data consisting of their keywords and publication years. It is found that the co-occurrence relationship between keywords, NGRAM, and their relationship with years can effectively present changes in linguistic research themes. This research is supposed to provide further insights and new methods that can be applied in the field of linguistics and related disciplines.
Keywords
keyword extraction, TF-IDF, N-Gram, Linear Discriminant Analysis (LDA)
Hrčak ID:
309215
URI
Publication date:
28.9.2023.
Visits: 483 *