Technical gazette, Vol. 26 No. 3, 2019.
Original scientific paper
https://doi.org/10.17559/TV-20190420161815
Naive Bayesian Automatic Classification of Railway Service Complaint Text Based on Eigenvalue Extraction
Lifeng Li
; School of Economics and Management, Beijing Jiaotong University, No. 3, Shangyuancun, Haidian District, Beijing, China
Wenxing Li
; School of Economics and Management, Beijing Jiaotong University, No. 3, Shangyuancun, Haidian District, Beijing, China
Abstract
Railways have developed rapidly in China for several decades. The hardware of railways has already reached the world's leading level, but the level of service of these railways still has room for improvement. The railway management department receives a large number of passenger complaints every year and records them in text, which needs to be classified and analyzed. The text of railway complaints includes characteristics spanning wide business coverage, various events, serious colloquialisms, interference and useless information. When using the direct classification via traditional text categorization, the classification accuracy is low. The key to the automatic classification of such text lies in an eigenvalue extraction. The more accurate the eigenvalue extraction, the higher the accuracy of text classification. In this paper, the TF-IDF algorithm, TextRank algorithm and Word2vec algorithm are selected to extract text eigenvalues, and a railway complaint text classification method is constructed with a naive Bayesian classifier. The three types of eigenvalue extraction algorithms are compared. The TF-IDF algorithm, based on eigenvalue extraction, achieves the highest automatic text classification accuracy.
Keywords
automatic classification; eigenvalue; naive Bayes; railway complaint text; TextRank; TF-IDF; Word2vec
Hrčak ID:
221004
URI
Publication date:
12.6.2019.
Visits: 2.193 *