首页> 外文会议>Workshop on deep learing approaches for low-resource natural language processing >Few-Shot and Zero-Shot Learning for Historical Text Normalization
【24h】

Few-Shot and Zero-Shot Learning for Historical Text Normalization

机译:用于历史文本规范化的少量学习和零学习

获取原文

摘要

Historical text normalization often relies on small training datasets. Recent work has shown that multi-task learning can lead to significant improvements by exploiting synergies with related datasets, but there has been no systematic study of different multitask learning architectures. This paper evaluates 63 multi-task learning configurations for sequence-to-sequence-based historical text normalization across ten datasets from eight languages, using autoencoding, grapheme-to-phoneme mapping, and lemmatization as auxiliary tasks. We observe consistent, significant improvements across languages when training data for the target task is limited, but minimal or no improvements when training data is abundant. We also show that zero-shot learning outperforms the simple, but relatively strong, identity baseline.
机译:历史文本规范化通常依赖于小的训练数据集。最近的工作表明,多任务学习可以通过利用与相关数据集的协同作用来带来重大改进,但是尚未对不同的多任务学习体系结构进行系统研究。本文使用自动编码,字素到音素映射和词素化作为辅助任务,对63种多任务学习配置进行了评估,以对来自八种语言的十个数据集进行基于序列到序列的历史文本规范化。当目标任务的训练数据有限时,我们会观察到跨语言的一致,显着的改进,但是当训练数据丰富时,则几乎没有或没有改进。我们还表明,零击学习的性能优于简单但相对较强的身份基准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号