Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.20532/cit.2022.1005566

Research on Keywords Variations in Linguistics Based on TF-IDF and N-gram

Yuyao Li ; School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, China *
Xueyi Wen orcid id orcid.org/0009-0001-2771-819X ; The Institute of Corpus Studies and Applications, Shanghai International Studies University, Shanghai, China
Xingyu Liu ; Faculty of English Language and Culture, Guangdong University of Foreign Studies, Guangzhou, China

* Autor za dopisivanje.


Puni tekst: engleski pdf 692 Kb

str. 193-204

preuzimanja: 125

citiraj


Sažetak

The rapid development of natural language processing (NLP) holds great promise for bridging the divide among languages. One of its main innovative applications is to use broad data to explore the historical trend of a subject. However, since Saussure pioneered modern linguistics, there is relatively inadequate research work done in the linguistic research on the field's variations to comprehensively reveal the linguistic trends. To trace the changes in linguistic research hotspots, we use a dataset of more than 30,000 linguistics-related literature with their titles from the Web of Science and apply NLP techniques to the data consisting of their keywords and publication years. It is found that the co-occurrence relationship between keywords, NGRAM, and their relationship with years can effectively present changes in linguistic research themes. This research is supposed to provide further insights and new methods that can be applied in the field of linguistics and related disciplines.

Ključne riječi

keyword extraction, TF-IDF, N-Gram, Linear Discriminant Analysis (LDA)

Hrčak ID:

309215

URI

https://hrcak.srce.hr/309215

Datum izdavanja:

28.9.2023.

Posjeta: 418 *