Incorporating large unlabeled data to enhance EM classification

Xintao Wu

首页> 外文期刊>Journal of Intelligent Information Systems >Incorporating large unlabeled data to enhance EM classification

【24h】

Incorporating large unlabeled data to enhance EM classification

机译：合并大量未标记的数据以增强EM分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper investigates the problem of augmenting labeled data with unlabeled data to improve classification accuracy. This is significant for many applications such as image classification where obtaining classification labels is expensive, while large unlabeled examples are easily available. We investigate an Expectation Maximization (EM) algorithm for learning from labeled and unlabeled data. The reason why unlabeled data boosts learning accuracy is because it provides the information about the joint probability distribution. A theoretical argument shows that the more unlabeled examples are combined in learning, the more accurate the result. We then introduce B-EM algorithm, based on the combination of EM with bootstrap method, to exploit the large unlabeled data while avoiding prohibitive I/O cost. Experimental results over both synthetic and real data sets show that the proposed approach has a satisfactory performance.

机译：本文研究了用未标记的数据扩充标记的数据以提高分类准确性的问题。这对于许多应用（例如图像分类）非常重要，在这些应用中，获得分类标签非常昂贵，而大型未标记示例很容易获得。我们研究了一种期望最大化（EM）算法，用于从标记和未标记的数据中学习。未标记数据提高学习准确性的原因是，它提供了有关联合概率分布的信息。理论上的论证表明，在学习中结合的未标记示例越多，结果越准确。然后，我们引入基于EM与引导方法相结合的B-EM算法，以利用大量未标记的数据，同时避免过高的I / O成本。综合和真实数据集的实验结果表明，该方法具有令人满意的性能。

著录项

来源
《Journal of Intelligent Information Systems》 |2006年第3期|p.211-226|共16页
作者
Xintao Wu;
展开▼
作者单位

CS Department, University of North Carolina at Charlotte, 9201 University City Blvd., Charlotte, NC 28223, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
expectation maximization; bootstrap; classification;

机译：期望最大化;引导;分类;

相似文献

外文文献
中文文献
专利

1. Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data [J] . Tomoya Sakai, Marthinus Christoffel Plessis, Gang Niu, JMLR: Workshop and Conference Proceedings . 2017,第4期

机译：基于来自正数据和未标记数据的分类的半监督分类
2. Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data [J] . Tomoya SAKAI, Marthinus CHRISTOFFEL DU PLESSIS, Gang NIU, 電子情報通信学会技術研究報告. 情報論的学習理論と機械学習 . 2016,第300期

机译：基于来自正数据和未标记数据的分类的半监督分类
3. Employing unlabeled data to improve the classification performance of SVM, and its application in audio event classification [J] . Leng Yan, Sun Chengli, Xu Xinyan, Knowledge-Based Systems . 2016,第Apra15期

机译：利用未标记数据提高支持向量机的分类性能及其在音频事件分类中的应用
4. Binary Classification Only from Unlabeled Data by Iterative Unlabeled-unlabeled Classification [C] . Hirotaka Kaji, Masashi Sugiyama IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：仅通过迭代未标记-未标记分类从未标记数据中进行二进制分类
5. Using unlabeled data to improve text classification. [D] . Nigam, Kanal Paul. 2001

机译：使用未标记的数据来改善文本分类。
6. Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals [O] . Hamed Hassanzadeh, Mahnoosh Kholghi, Anthony Nguyen, 2018

机译：跨医院使用标记和未标记数据的临床文件分类
7. Automatic Webpage Classification Enhanced by Unlabeled Data [O] . Seong-bae Park, Byoung-tak Zhang 2008

机译：未标记数据增强了自动网页分类

Incorporating large unlabeled data to enhance EM classification

摘要

著录项

相似文献

相关主题

期刊订阅