首页> 外文期刊>Journal of Computers >An Improved Random Forest Classifier for Text Categorization
【24h】

An Improved Random Forest Classifier for Text Categorization

机译:文本分类的改进的随机林分类器

获取原文
           

摘要

—This paper proposes an improved random forest algorithm for classifying text data. This algorithm is particularly designed for analyzing very high dimensional data with multiple classes whose well-known representative data is text corpus. A novel feature weighting method and tree selection method are developed and synergistically served for making random forest framework well suited to categorize text documents with dozens of topics. With the new feature weighting method for subspace sampling and tree selection method, we can effectively reduce subspace size and improve classification performance without increasing error bound. We apply the proposed method on six text data sets with diverse characteristics. The results have demonstrated that this improved random forests outperformed the popular text classification methods in terms of classification performance.
机译:- 这篇论文提出了一种改进的随机森林算法来分类文本数据。该算法特别设计用于分析具有多个类的非常高维数据,其众所周知的代表数据是文本语料库。开发了一种新颖的特征权重方法和树选择方法,并协同为使随机森林框架制作,非常适合用数十个主题分类文本文档。通过用于子空间采样和树选择方法的新功能加权方法,我们可以有效降低子空间尺寸,提高分类性能而不会增加错误绑定。我们在具有不同特征的六个文本数据集上应用所提出的方法。结果表明,这种改进的随机森林在分类绩效方面优先表现出普遍的文本分类方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号