Skip to the main content

Original scientific paper

https://doi.org/10.17559/TV-20190420161815

Naive Bayesian Automatic Classification of Railway Service Complaint Text Based on Eigenvalue Extraction

Lifeng Li ; School of Economics and Management, Beijing Jiaotong University, No. 3, Shangyuancun, Haidian District, Beijing, China
Wenxing Li ; School of Economics and Management, Beijing Jiaotong University, No. 3, Shangyuancun, Haidian District, Beijing, China


Full text: english pdf 485 Kb

page 778-785

downloads: 972

cite


Abstract

Railways have developed rapidly in China for several decades. The hardware of railways has already reached the world's leading level, but the level of service of these railways still has room for improvement. The railway management department receives a large number of passenger complaints every year and records them in text, which needs to be classified and analyzed. The text of railway complaints includes characteristics spanning wide business coverage, various events, serious colloquialisms, interference and useless information. When using the direct classification via traditional text categorization, the classification accuracy is low. The key to the automatic classification of such text lies in an eigenvalue extraction. The more accurate the eigenvalue extraction, the higher the accuracy of text classification. In this paper, the TF-IDF algorithm, TextRank algorithm and Word2vec algorithm are selected to extract text eigenvalues, and a railway complaint text classification method is constructed with a naive Bayesian classifier. The three types of eigenvalue extraction algorithms are compared. The TF-IDF algorithm, based on eigenvalue extraction, achieves the highest automatic text classification accuracy.

Keywords

automatic classification; eigenvalue; naive Bayes; railway complaint text; TextRank; TF-IDF; Word2vec

Hrčak ID:

221004

URI

https://hrcak.srce.hr/221004

Publication date:

12.6.2019.

Visits: 2.193 *