首页> 外文期刊>Information Processing & Management >Text Classification From Unlabeled Documents With Bootstrapping And Feature Projection Techniques
【24h】

Text Classification From Unlabeled Documents With Bootstrapping And Feature Projection Techniques

机译:使用自举和特征投影技术对未标记文档进行文本分类

获取原文
获取原文并翻译 | 示例
       

摘要

Many machine learning algorithms have been applied to text classification tasks. In the machine learning paradigm, a general inductive process automatically builds a text classifier by learning, generally known as supervised learning. However, the supervised learning approaches have some problems. The most notable problem is that they require a large number of labeled training documents for accurate learning. While unlabeled documents are easily collected and plentiful, labeled documents are difficultly generated because a labeling task must be done by human developers. In this paper, we propose a new text classification method based on unsupervised or semi-supervised learning. The proposed method launches text classification tasks with only unlabeled documents and the title word of each category for learning, and then it automatically learns text classifier by using bootstrapping and feature projection techniques. The results of experiments showed that the proposed method achieved reasonably useful performance compared to a supervised method. If the proposed method is used in a text classification task, building text classification systems will become significantly faster and less expensive.
机译:许多机器学习算法已应用于文本分类任务。在机器学习范例中,一般的归纳过程通过学习自动建立文本分类器,通常称为监督学习。但是,监督学习方法存在一些问题。最显着的问题是,他们需要大量带有标签的培训文档才能进行准确的学习。尽管未加标签的文档易于收集且数量很多,但由于必须由人类开发人员完成加标签任务,因此很难生成加标签的文档。本文提出了一种基于无监督或半监督学习的文本分类新方法。所提出的方法以仅未标记文档和每个类别的标题词启动文本分类任务进行学习,然后通过自举和特征投影技术自动学习文本分类器。实验结果表明,与有监督的方法相比,该方法具有较好的实用性能。如果将提出的方法用于文本分类任务,则构建文本分类系统将变得更快,更便宜。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号