Semi-supervised text categorization: Exploiting unlabeled data using ensemble learning algorithms

Mohammad Reza Keyvanpour; Maryam Bahojb Imani

首页> 外文期刊>Intelligent data analysis >Semi-supervised text categorization: Exploiting unlabeled data using ensemble learning algorithms

【24h】

Semi-supervised text categorization: Exploiting unlabeled data using ensemble learning algorithms

机译：半监督文本分类：使用集成学习算法开发未标记的数据

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text categorization is one of the fundamental tasks in text mining. Classical supervised methods need lot of labeled data to train a classifier. Since assigning labels to the large amount of data is very costly and time consuming, it is useful to use data sets without labels. So many different semi-supervised learning methods have been studied recently. Among these semi-supervised methods, self-training is one of the important learning algorithms that classifies unlabeled samples with small amount of labeled ones and adds the most confident samples to the training set. In this paper, dynamic weighting beside majority vote approach is applied to classify the unlabeled data to reliable and unreliable classes. Then, the reliable data are added to the training set and the remaining data including unreliable data are classified in iterative process. We tested this method on the extracted features of ten common Reuter-21578 classes. Experimental result indicates that proposed method improves the classification performance and it's effective.

机译：文本分类是文本挖掘中的基本任务之一。经典的监督方法需要大量标记数据来训练分类器。由于将标签分配给大量数据非常昂贵且耗时，因此使用不带标签的数据集很有用。最近研究了许多不同的半监督学习方法。在这些半监督方法中，自训练是重要的学习算法之一，该算法将未标记样本与少量标记样本分类，并将最有信心的样本添加到训练集中。在本文中，除了多数表决方法外，还采用动态加权将未标记数据分类为可靠和不可靠的类别。然后，将可靠数据添加到训练集中，并在迭代过程中对包括不可靠数据的其余数据进行分类。我们在十个常见Reuter-21578类的提取特征上测试了此方法。实验结果表明，该方法提高了分类性能，是有效的。

著录项

来源
《Intelligent data analysis》 |2013年第3期|367-385|共19页
作者
Mohammad Reza Keyvanpour; Maryam Bahojb Imani;
展开▼
作者单位

Department of Computer Engineering, Alzahra University, Vanak sq, Tehran, Iran;

Department of Computer Engineering, Alzahra University, Vanak sq, Tehran, Iran;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Text categorization; semi-supervised learning; self training; ensemble learning; dynamic weighting;

机译：文字分类;半监督学习;自我训练;整体学习;动态加权;

相似文献

外文文献
中文文献
专利

1. SEMI-SUPERVISED LEARNING: EXPLOITING UNLABELED DATA WITH SYMMETRICAL DISTRIBUTION AND HIGH CONFIDENCE [J] . YIHAO ZHANG, JUNHAO WEN, FANGFANG TANG, International Journal of Pattern Recognition and Artificial Intelligence . 2012,第7期

机译：半监督的学习：利用对称分布和高置信度来探索无法标记的数据
2. On incrementally using a small portion of strong unlabeled data for semi-supervised learning algorithms [J] . Thanh-Binh Le, Sang-Woon Kim Pattern recognition letters . 2014,第MAYa1期

机译：逐步将一小部分强的未标记数据用于半监督学习算法
3. Rough set and ensemble learning based semi-supervised algorithm for text classification [J] . Lei Shi, Xinming Ma, Lei Xi, Expert Systems with Application . 2011,第5期

机译：基于粗糙集和集成学习的半监督文本分类算法
4. Semi-Supervised Learning by Exploiting Unlabeled Data Correlations in a Dual-Branch Network [C] . Jie Ling, Meng Yang IEEE International Conference on Multimedia and Expo . 2021

机译：通过在双分支网络中利用未标记的数据相关性进行半监督学习
5. Exploitation of unlabeled data and related tasks in semi-supervised learning. [D] . Liu, Qiuhua. 2007

机译：在半监督学习中利用未标记的数据和相关任务。
6. Positive-unlabeled ensemble learning for kinase substrate prediction from dynamic phosphoproteomics data [O] . Pengyi Yang, Sean J. Humphrey, David E. James, -1

机译：从动态磷酸蛋白质组学数据预测激酶底物的正无标记集成学习
7. Semi-Supervised Japanese Word Sense Disambiguation Based on Two-Stage Classification of Unlabeled Data and Ensemble Learning [O] . Tatsukuni Inoue, Hiroaki Saito 2011

机译：基于未标记数据和集合学习的两阶段分类的半监督日语词语消歧

Semi-supervised text categorization: Exploiting unlabeled data using ensemble learning algorithms

摘要

著录项

相似文献

相关主题

期刊订阅