Handling Noisy Labels for Robustly Learning from Self-Training Data for Low-Resource Sequence Labeling

机译：处理嘈杂的标签以从低资源序列标签的自训练数据中稳健学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we address the problem of effectively self-training neural networks in a low-resource setting. Self-training is frequently used to automatically increase the amount of training data. However, in a low-resource scenario, it is less effective due to unreliable annotations created using self-labeling of unlabeled data. We propose to combine self-training with noise handling on the self-labeled data. Directly estimating noise on the combined clean training set and self-labeled data can lead to corruption of the clean data and hence, performs worse. Thus, we propose the Clean and Noisy Label Neural Network which trains on clean and noisy self-labeled data simultaneously by explicitly modelling clean and noisy labels separately. In our experiments on Chunking and NER, this approach performs more robustly than the baselines. Complementary to this explicit approach, noise can also be handled implicitly with the help of an auxiliary learning task. To such a complementary approach, our method is more beneficial than other baseline methods and together provides the best performance overall.

机译：在本文中，我们解决了在资源匮乏的情况下有效自我训练神经网络的问题。自我训练经常用于自动增加训练数据量。但是，在资源较少的情况下，由于使用未标记数据的自标记创建的注释不可靠，因此效果不佳。我们建议将自训练与对自标记数据的噪声处理相结合。直接估计干净训练集和自标记数据的组合上的噪声可能导致干净数据损坏，因此性能更差。因此，我们提出了“干净且嘈杂的标签神经网络”，该网络通过分别分别对干净和嘈杂的标签进行建模来同时训练干净且有噪声的自标记数据。在我们关于Chunking和NER的实验中，这种方法的性能比基线要强。作为此显式方法的补充，还可以借助辅助学习任务来隐式处理噪声。对于这种补充方法，我们的方法比其他基准方法更有利，并且可以共同提供最佳性能。

著录项

来源
《Annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies》|2019年|29-34|共6页
会议地点
作者
Debjit Paul; Mittul Singh; Michael A. Hedderich; Dietrich Klakow;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Robust Graph-Based Semisupervised Learning for Noisy Labeled Data via Maximum Correntropy Criterion [J] . Du Bo, Tang Xinyao, Wang Zengmao, Cybernetics, IEEE Transactions on . 2019,第4期

机译：通过最大熵准则对噪声标签数据进行基于图的鲁棒半监督学习
2. A feature learning approach for face recognition with robustness to noisy label based on top-N prediction [J] . Yang Menglong, Huang Feihu, Lv Xuebin Neurocomputing . 2019,第FEBa22期

机译：一种基于top-N预测的对噪声标签具有鲁棒性的人脸识别特征学习方法
3. Robust Semi-Supervised Classification for Noisy Labels Based on Self-Paced Learning [J] . Nannan Gu, Mingyu Fan, Deyu Meng IEEE signal processing letters . 2016,第12期

机译：基于自定步学习的噪声标签鲁棒半监督分类
4. Handling Noisy Labels for Robustly Learning from Self-Training Data for Low-Resource Sequence Labeling [C] . Debjit Paul, Mittul Singh, Michael A. Hedderich, Annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies . 2019

机译：处理嘈杂的标签，用于从自动训练数据中获取低资源序列标签
5. On Peer Loss: Its Theory and the Applications to the Problem of Learning with Noisy Labels [D] . Li, Xingyu. 2020

机译：对同行损失：其理论与嘈杂标签学习问题的应用
6. Learning statistical models of phenotypes using noisy labeled training data [O] . Vibhu Agarwal, Tanya Podchiyska, Juan M Banda, 2016

机译：使用带噪声的训练数据学习表型的统计模型
7. Group-Teaching: Learning Robust CNNs From Extremely Noisy Labels [O] . Yunping Zheng, Yuming Chen, Mudar Sarem 2020

机译：小组教学：从极度嘈杂的标签学习强大的CNN

Handling Noisy Labels for Robustly Learning from Self-Training Data for Low-Resource Sequence Labeling

摘要

著录项

相似文献

相关主题

期刊订阅