首页> 外文会议>Annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies >Handling Noisy Labels for Robustly Learning from Self-Training Data for Low-Resource Sequence Labeling
【24h】

Handling Noisy Labels for Robustly Learning from Self-Training Data for Low-Resource Sequence Labeling

机译:处理嘈杂的标签以从低资源序列标签的自训练数据中稳健学习

获取原文

摘要

In this paper, we address the problem of effectively self-training neural networks in a low-resource setting. Self-training is frequently used to automatically increase the amount of training data. However, in a low-resource scenario, it is less effective due to unreliable annotations created using self-labeling of unlabeled data. We propose to combine self-training with noise handling on the self-labeled data. Directly estimating noise on the combined clean training set and self-labeled data can lead to corruption of the clean data and hence, performs worse. Thus, we propose the Clean and Noisy Label Neural Network which trains on clean and noisy self-labeled data simultaneously by explicitly modelling clean and noisy labels separately. In our experiments on Chunking and NER, this approach performs more robustly than the baselines. Complementary to this explicit approach, noise can also be handled implicitly with the help of an auxiliary learning task. To such a complementary approach, our method is more beneficial than other baseline methods and together provides the best performance overall.
机译:在本文中,我们解决了在资源匮乏的情况下有效自我训练神经网络的问题。自我训练经常用于自动增加训练数据量。但是,在资源较少的情况下,由于使用未标记数据的自标记创建的注释不可靠,因此效果不佳。我们建议将自训练与对自标记数据的噪声处理相结合。直接估计干净训练集和自标记数据的组合上的噪声可能导致干净数据损坏,因此性能更差。因此,我们提出了“干净且嘈杂的标签神经网络”,该网络通过分别分别对干净和嘈杂的标签进行建模来同时训练干净且有噪声的自标记数据。在我们关于Chunking和NER的实验中,这种方法的性能比基线要强。作为此显式方法的补充,还可以借助辅助学习任务来隐式处理噪声。对于这种补充方法,我们的方法比其他基准方法更有利,并且可以共同提供最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号