...
首页> 外文期刊>Knowledge and information systems >Adaptive semi-supervised learning on labeled and unlabeled data with different distributions
【24h】

Adaptive semi-supervised learning on labeled and unlabeled data with different distributions

机译:对具有不同分布的标记和未标记数据进行自适应半监督学习

获取原文
获取原文并翻译 | 示例
           

摘要

Developing methods for designing good classifiers from labeled samples whose distribution is different from that of test samples is an important and challenging research issue in the fields of machine learning and its application. This paper focuses on designing semi-supervised classifiers with a high generalization ability by using unlabeled samples drawn by the same distribution as the test samples and presents a semi-supervised learning method based on a hybrid discriminative and generative model. Although JESS-CM is one of the most successful semi-supervised classifier design frameworks based on a hybrid approach, it has an overfitting problem in the task setting that we consider in this paper. We propose an objective function that utilizes both labeled and unlabeled samples for the discriminative training of hybrid classifiers and then expect the objective function to mitigate the overfitting problem. We show the effect of the objective function by theoretical analysis and empirical evaluation. Our experimental results for text classification using four typical benchmark test collections confirmed that with our task setting in most cases, the proposed method outperformed the JESS-CM framework. We also confirmed experimentally that the proposed method was useful for obtaining better performance when classifying data samples into either known or unknown classes, which were included in given labeled samples or not, respectively.
机译:从分布与测试样本分布不同的标记样本中开发设计好的分类器的方法,是机器学习及其应用领域中一个重要且具有挑战性的研究问题。本文致力于通过使用与测试样本具有相同分布分布的未标记样本来设计具有高泛化能力的半监督分类器,并提出一种基于混合判别和生成模型的半监督学习方法。尽管JESS-CM是基于混合方法的最成功的半监督分类器设计框架之一,但它在我们在本文中考虑的任务设置中存在过拟合的问题。我们提出了一种目标函数,该目标函数利用标记和未标记的样本进行混合分类器的判别训练,然后期望该目标函数减轻过度拟合的问题。我们通过理论分析和实证评估表明目标函数的效果。我们使用四个典型的基准测试集合进行文本分类的实验结果证实,在大多数情况下,通过我们的任务设置,所提出的方法优于JESS-CM框架。我们还通过实验证实了,当将数据样本分为已知或未知类(分别包含在给定标记的样本中或不包含在给定的样本中)时,所提出的方法可用于获得更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号