首页> 外文期刊>International journal of data mining, modelling and management >Effective feature selection technique for text classification
【24h】

Effective feature selection technique for text classification

机译:用于文本分类的有效特征选择技术

获取原文
获取原文并翻译 | 示例
           

摘要

Text classification plays a vital role in the organisation of the unceasing growth of digital documents. High dimensionality of feature space is a major hassle in text classification. Feature selection, an effective preprocessing technique improves the computational efficiency and the accuracy of a text classifier. In the present paper, text classification is performed with Zipf s law-based feature selection and the use of linear SVM weight for feature ranking. A hybrid feature selection method combining these two feature selection techniques is proposed. Nearest neighbour and SVM classifiers are chosen as text classifiers for their good classification accuracy reported in many text classification tasks. Moreover, to investigate the effect of kernel type on the text classification both linear and non-linear kernels in SVM are examined. The performance is evaluated by determining classification accuracy using ten-fold cross-validation. Experimental results with four benchmark corpuses were encouraging and demonstrated that the classification performance using hybrid feature selection method outperformed the classification performance obtained by selecting either medium frequent features based on Zipf s law or using feature selection by linear SVM.
机译:文本分类在数字文档不断增长的组织中起着至关重要的作用。特征空间的高维性是文本分类中的主要麻烦。特征选择是一种有效的预处理技术,可提高文本分类器的计算效率和准确性。在本文中,使用基于Zipf律的特征选择以及使用线性SVM权重进行特征排名来执行文本分类。提出了结合这两种特征选择技术的混合特征选择方法。选择最近邻居和SVM分类器作为文本分类器,是因为它们在许多文本分类任务中均具有良好的分类精度。此外,为了研究内核类型对文本分类的影响,还检查了SVM中的线性和非线性内核。通过使用十倍交叉验证确定分类准确性来评估性能。四个基准语料库的实验结果令人鼓舞,并证明了使用混合特征选择方法的分类性能优于通过基于Zipf s律选择中等频繁特征或通过线性SVM使用特征选择所获得的分类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号