首页> 外文期刊>Technical Gazette >Naive Bayesian Automatic Classification of Railway Service Complaint Text Based on Eigenvalue Extraction
【24h】

Naive Bayesian Automatic Classification of Railway Service Complaint Text Based on Eigenvalue Extraction

机译:基于特征值提取的Naive Bayesian自动分类铁路服务投诉文本

获取原文
           

摘要

Railways have developed rapidly in China for several decades. The hardware of railways has already reached the world's leading level, but the level of service of these railways still has room for improvement. The railway management department receives a large number of passenger complaints every year and records them in text, which needs to be classified and analyzed. The text of railway complaints includes characteristics spanning wide business coverage, various events, serious colloquialisms, interference and useless information. When using the direct classification via traditional text categorization, the classification accuracy is low. The key to the automatic classification of such text lies in an eigenvalue extraction. The more accurate the eigenvalue extraction, the higher the accuracy of text classification. In this paper, the TF-IDF algorithm, TextRank algorithm and Word2vec algorithm are selected to extract text eigenvalues, and a railway complaint text classification method is constructed with a naive Bayesian classifier. The three types of eigenvalue extraction algorithms are compared. The TF-IDF algorithm, based on eigenvalue extraction, achieves the highest automatic text classification accuracy.
机译:铁路在中国迅速发展了几十年。铁路的硬件已经达到了世界领先水平,但这些铁路的服务水平仍然有改进的空间。铁路管理部门每年收到大量乘客投诉,并将其记录在文本中,需要进行分类和分析。铁路投诉的文本包括跨越商业覆盖范围,各种事件,严重的口语,干扰和无用信息的特征。通过传统文本分类使用直接分类,分类准确度低。自动分类这些文本的关键在于特征值提取。特征值提取越准确,文本分类的准确性越高。在本文中,选择TF-IDF算法,TEXTRANK算法和WORD2VEC算法以提取文本特征值,并用NAIVE贝叶斯分类器构建铁路投诉文本分类方法。比较了三种类型的特征值提取算法。基于特征值提取的TF-IDF算法实现了最高的自动文本分类准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号