首页> 外文会议>2017 International Conference on Engineering and Technology >Efficient method for feature selection in text classification
【24h】

Efficient method for feature selection in text classification

机译:文本分类中特征选择的有效方法

获取原文
获取原文并翻译 | 示例

摘要

In the process of dealing with the classification of text, because the text after the Chinese word segmentation, an article will have a large number of feature words, for this feature, the document vector dimension will reach tens of thousands or even hundreds of thousands of dimensions, although theoretically Speaking, a large number of feature words can better characterize a document, but a document contains a large number of features for the classification of the feature word, its value is quite low. So the need to screen out those who have the classification of the word, to reduce the operational dimension of the purpose. This paper studies the traditional feature selection algorithm, and according to the shortcomings of the chi-square test method, Based on the shortcomings of traditional chi-square test, this paper presents an improved method of chi-square test combined with frequency and interclass concentration. Experiments show that the method has a good effect on the traditional chi-square test method.
机译:在处理文本分类的过程中,由于文本经过中文分词后,文章将具有大量的特征词,为此,文档向量维数将达到数万甚至数十万维度,虽然从理论上讲,大量的特征词可以更好地表征文档,但是文档包含大量的特征用于特征词的分类,其价值相当低。因此需要筛选出那些具有单词分类功能的人,以减小目的的操作范围。本文研究了传统的特征选择算法,并根据卡方检验方法的不足,基于传统卡方检验的不足,提出了一种结合频率和类间集中度的卡方检验改进方法。实验表明,该方法对传统的卡方检验方法具有良好的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号