首页> 外文期刊>Computer Science and Information Systems >Active Semi-supervised Framework with Data Editing
【24h】

Active Semi-supervised Framework with Data Editing

机译:具有数据编辑功能的主动半监督框架

获取原文
           

摘要

In order to address the insufficient training data problem, many active semi-supervised algorithms have been proposed. The self-labeled training data in semi-supervised learning may contain much noise due to the insufficient training data. Such noise may snowball themselves in the following learning process and thus hurt the generalization ability of the final hypothesis. Extremely few labeled training data in sparsely labeled text classification aggravate such situation. If such noise could be identified and removed by some strategy, the performance of the active semi-supervised algorithms should be improved. However, such useful techniques of identifying and removing noise have been seldom explored in existing active semi-supervised algorithms. In this paper, we propose an active semi-supervised framework with data editing (we call it ASSDE) to improve sparsely labeled text classification. A data editing technique is used to identify and remove noise introduced by semi-supervised labeling. We carry out the data editing technique by fully utilizing the advantage of active learning, which is novel according to our knowledge. The fusion of active learning with data editing makes ASSDE more robust to the sparsity and the distribution bias of the training data. It further simplifies the design of semi-supervised learning which makes ASSDE more efficient. Extensive experimental study on several real-world text data sets shows the encouraging results of the proposed framework for sparsely labeled text classification, compared with several state-of-the-art methods.
机译:为了解决训练数据不足的问题,提出了许多主动的半监督算法。由于训练数据不足,半监督学习中的自标记训练数据可能包含很多噪声。这种噪音可能会在接下来的学习过程中滚雪球,从而损害最终假设的概括能力。稀疏标记文本分类中极少有标记标记的训练数据加剧了这种情况。如果可以通过某种策略识别并消除此类噪声,则应提高主动半监督算法的性能。但是,在现有的主动半监督算法中很少探索到这种识别和消除噪声的有用技术。在本文中,我们提出了一种主动的半监督框架,该框架具有数据编辑功能(我们将其称为ASSDE),以改善稀疏标记的文本分类。数据编辑技术用于识别和消除由半监督标签引入的噪声。我们充分利用主动学习的优势来进行数据编辑技术,根据我们的知识,这是新颖的。主动学习与数据编辑的融合使ASSDE对训练数据的稀疏性和分布偏倚更加鲁棒。它进一步简化了半监督学习的设计,从而使ASSDE更加高效。与几种最先进的方法相比,对多个实际文本数据集的广泛实验研究表明,所提出的稀疏标记文本分类框架的令人鼓舞的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号