首页> 中文期刊> 《铁道学报》 >基于不平衡文本数据挖掘的铁路信号设备故障智能分类

基于不平衡文本数据挖掘的铁路信号设备故障智能分类

         

摘要

针对铁路信号设备不平衡故障文本数据,提出基于文本挖掘的铁路信号设备故障智能分类模型.采用TF-IDF模型实现电务信号设备故障文本的特征提取并转换为向量,基于Voting的方式实现多分类器集成学习分类.该模型利用SVM-SMOTE算法对TF-IDF转换后的小类别文本向量数据进行随机生成,采用逻辑回归、朴素贝叶斯、SVM等基分类器和GBDT、随机森林集成分类器对平衡后的数据进行分类,考虑不同分类器的适用特点,通过Voting方式进行多分类器集成学习.通过对某铁路局2012—2016年铁路信号设备故障文本数据进行试验分析,表明该模型可使故障分类的准确率、召回率和F-score均得到显著提升.%In this paper,an intelligent fault classification model for railway signal equipment based on text min-ing was proposed for the imbalanced fault text data of railway signal equipment.The TF-IDF model was used to realize the feature extraction and vector transformation of the fault text of the electrical signal equipment and to realize the integrated learning classification of multi-classifier based on the Voting method.The model firstly used the SVM-SMOTE algorithm to generate TF-IDF converted disequilibrium of text vector data randomly, then used some base classifiers(logistic regression,naive Bayesian,SVM,etc.)and some integrated classifi-ers(GBDT and random forests)to classify the balanced data,and finally brings up a multiple classifier ensem-ble learning by way of the Voting method,considering the characteristics of different classifiers.The analysis of the data of railway signal equipment failure text of a railway bureau from 2012 to 2016 shows that the model can improve the accuracy,recall rate and the F-score of fault classification.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号