首页> 外文期刊>Journal of Intelligent Information Systems >Incorporating large unlabeled data to enhance EM classification
【24h】

Incorporating large unlabeled data to enhance EM classification

机译:合并大量未标记的数据以增强EM分类

获取原文
获取原文并翻译 | 示例
       

摘要

This paper investigates the problem of augmenting labeled data with unlabeled data to improve classification accuracy. This is significant for many applications such as image classification where obtaining classification labels is expensive, while large unlabeled examples are easily available. We investigate an Expectation Maximization (EM) algorithm for learning from labeled and unlabeled data. The reason why unlabeled data boosts learning accuracy is because it provides the information about the joint probability distribution. A theoretical argument shows that the more unlabeled examples are combined in learning, the more accurate the result. We then introduce B-EM algorithm, based on the combination of EM with bootstrap method, to exploit the large unlabeled data while avoiding prohibitive I/O cost. Experimental results over both synthetic and real data sets show that the proposed approach has a satisfactory performance.
机译:本文研究了用未标记的数据扩充标记的数据以提高分类准确性的问题。这对于许多应用(例如图像分类)非常重要,在这些应用中,获得分类标签非常昂贵,而大型未标记示例很容易获得。我们研究了一种期望最大化(EM)算法,用于从标记和未标记的数据中学习。未标记数据提高学习准确性的原因是,它提供了有关联合概率分布的信息。理论上的论证表明,在学习中结合的未标记示例越多,结果越准确。然后,我们引入基于EM与引导方法相结合的B-EM算法,以利用大量未标记的数据,同时避免过高的I / O成本。综合和真实数据集的实验结果表明,该方法具有令人满意的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号