...
首页> 外文期刊>Knowledge-Based Systems >Term-weighting learning via genetic programming for text classification
【24h】

Term-weighting learning via genetic programming for text classification

机译:通过遗传编程进行术语加权学习以进行文本分类

获取原文
获取原文并翻译 | 示例
           

摘要

This paper describes a novel approach to learning term-weighting schemes (TWSs) in the context of text classification. In text mining a TWS determines the way in which documents will be represented in a vector space model, before applying a classifier. Whereas acceptable performance has been obtained with standard TWSs (e.g., Boolean and term-frequency schemes), the definition of TWSs has been traditionally an art. Further, it is still a difficult task to determine what is the best TWS for a particular problem and it is not clear yet, whether better schemes, than those currently available, can be generated by combining known TWS. We propose in this article a genetic program that aims at learning effective TWSs that can improve the performance of current schemes in text classification. The genetic program learns how to combine a set of basic units to give rise to discriminative TWSs. We report an extensive experimental study comprising data sets from thematic and non-thematic text classification as well as from image classification. Our study shows the validity of the proposed method; in fact, we show that TWSs learned with the genetic program outperform traditional schemes and other TWSs proposed in recent works. Further, we show that TWSs learned from a specific domain can be effectively used for other tasks. (C) 2015 Elsevier B.V. All rights reserved.
机译:本文介绍了一种在文本分类的上下文中学习术语加权方案(TWS)的新颖方法。在文本挖掘中,在应用分类器之前,TWS确定在矢量空间模型中表示文档的方式。尽管使用标准交易平台(例如布尔和项频方案)已获得可接受的性能,但是交易平台的定义传统上是一门技术。此外,确定特定问题的最佳TWS仍然是一项艰巨的任务,目前尚不清楚,是否可以通过组合已知的TWS生成比当前可用的更好的方案。我们在本文中提出了一个遗传程序,旨在学习有效的TWS,可以提高当前方案在文本分类中的性能。遗传程序学习如何结合一组基本单位来产生有区别的TWS。我们报告了一项广泛的实验研究,包括来自主题和非主题文本分类以及图像分类的数据集。我们的研究表明了该方法的有效性。实际上,我们表明,通过遗传程序学习的交易平台比传统计划和最近的工作中提出的其他交易平台都要好。此外,我们表明,从特定领域中学到的TWS可以有效地用于其他任务。 (C)2015 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号