首页> 外文会议>Mexican international conference on artificial intelligence >Feature Selection Based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification Using Naieve Bayes
【24h】

Feature Selection Based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification Using Naieve Bayes

机译:基于采样和C4.5算法的特征选择,以提高使用恶劣贝叶斯的文本分类质量

获取原文

摘要

Automatic text classification into predefined categories is an increasingly important task given the vast number of electronic documents available on the Internet and enterprise servers. Successful text classification relies heavily on the vital task of dimensionality reduction, which aims to improve classification accuracy, give greater expression to the classification process, and improve classification computational efficiency. In this paper, two algorithms for feature selection are presented, based on sampling and weighted sampling that build on the C4.5 algorithm. The results demonstrate considerable improvements with regard to classification accuracy - up to 10% - compared to traditional algorithms such as C4.5, Naieve Bayes and Support Vector Machines. The classification process is performed using the Naieve Bayes model in the space of reduced dimensionality. Experiments were carried out using data sets based on the Reuters-21578 collection.
机译:在Internet和Enterprise服务器上提供的广大电子文档,将自动文本分类为预定义类别是一个越来越重要的任务。 成功的文本分类严重依赖于维度减少的重要任务,这旨在提高分类准确性,给予分类过程的更大表达,提高分类计算效率。 在本文中,基于在C4.5算法上构建的采样和加权采样,提出了两个用于特征选择的算法。 结果表明,与分类精度相比,高达10% - 与传统算法相比,如C4.5,即天化贝叶斯和支持向量机。 在减少维度降低的空间中使用明示贝叶斯模型进行分类过程。 基于REUTERS-21578集合使用数据集进行实验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号