Text Classification From Unlabeled Documents With Bootstrapping And Feature Projection Techniques

Youngjoong Ko; Jungyun Seo

首页> 外文期刊>Information Processing & Management >Text Classification From Unlabeled Documents With Bootstrapping And Feature Projection Techniques

【24h】

Text Classification From Unlabeled Documents With Bootstrapping And Feature Projection Techniques

机译：使用自举和特征投影技术对未标记文档进行文本分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many machine learning algorithms have been applied to text classification tasks. In the machine learning paradigm, a general inductive process automatically builds a text classifier by learning, generally known as supervised learning. However, the supervised learning approaches have some problems. The most notable problem is that they require a large number of labeled training documents for accurate learning. While unlabeled documents are easily collected and plentiful, labeled documents are difficultly generated because a labeling task must be done by human developers. In this paper, we propose a new text classification method based on unsupervised or semi-supervised learning. The proposed method launches text classification tasks with only unlabeled documents and the title word of each category for learning, and then it automatically learns text classifier by using bootstrapping and feature projection techniques. The results of experiments showed that the proposed method achieved reasonably useful performance compared to a supervised method. If the proposed method is used in a text classification task, building text classification systems will become significantly faster and less expensive.

机译：许多机器学习算法已应用于文本分类任务。在机器学习范例中，一般的归纳过程通过学习自动建立文本分类器，通常称为监督学习。但是，监督学习方法存在一些问题。最显着的问题是，他们需要大量带有标签的培训文档才能进行准确的学习。尽管未加标签的文档易于收集且数量很多，但由于必须由人类开发人员完成加标签任务，因此很难生成加标签的文档。本文提出了一种基于无监督或半监督学习的文本分类新方法。所提出的方法以仅未标记文档和每个类别的标题词启动文本分类任务进行学习，然后通过自举和特征投影技术自动学习文本分类器。实验结果表明，与有监督的方法相比，该方法具有较好的实用性能。如果将提出的方法用于文本分类任务，则构建文本分类系统将变得更快，更便宜。

著录项

来源
《Information Processing & Management》 |2009年第1期|70-83|共14页
作者
Youngjoong Ko; Jungyun Seo;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
text classification; bootstrapping; feature projection; unlabeled data; text classifier;

机译：文本分类;自举;特征投影;未标记数据;文本分类器;

相似文献

外文文献
中文文献
专利

1. SVM based adaptive learning method for text classification from positive and unlabeled documents [J] . Tao Peng, Wanli Zuo, Fengling He Knowledge and information systems . 2008,第3期

机译：基于支持向量机的自适应学习方法从正向和未标记文档中进行文本分类
2. SVM based adaptive learning method for text classification from positive and unlabeled documents [J] . Tao Peng, Wanli Zuo, Fengling He Knowledge and Information Systems . 2008,第3期

机译：基于支持向量机的自适应学习方法从正向和未标记文档中进行文本分类
3. Text Classification from Labeled and Unlabeled Documents using EM [J] . KAMAL NIGAM, ANDREW KACHITES MCCALLUM, SEBASTIAN THRUN Machine Learning . 2000,第2a3期

机译：使用EM对标签和未标签文档进行文本分类
4. Learning with Unlabeled Data for Text Categorization Using Bootstrapping and Feature Projection Techniques [C] . Youngjoong Ko, Jungyun Seo Association for Computational Linguistics Annual Meeting(ACL-04); 20040721-26; Barcelona(ES) . 2004

机译：使用自举和特征投影技术学习未标记的数据以进行文本分类
5. Methods for Improving Natural Language Processing Techniques with Linguistic Regularities Extracted from Large Unlabeled Text Corpora [D] . Lucas, Michael Ryan. 2019

机译：提高了大型未标记文本语料库语言规律的自然语言处理技术的方法
6. A Novel Feature Selection Technique for Text Classification Using Naïve Bayes [O] . Subhajit Dey Sarkar, Saptarsi Goswami, Aman Agarwal, 2014

机译：基于朴素贝叶斯的文本分类新特征选择技术
7. Learning with unlabeled data for text categorization using bootstrapping and feature projection techniques [O] . Youngjoong Ko, Jungyun Seo 2004

机译：使用自举和特征投影技术学习未标记的数据以进行文本分类
8. Using Unlabeled Data to Improve Text Classification [R] . Nigam, K. P. 2001

机译：使用未标记的数据改进文本分类

Text Classification From Unlabeled Documents With Bootstrapping And Feature Projection Techniques

摘要

著录项

相似文献

相关主题

期刊订阅