首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation
【24h】

Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation

机译:使用弹性重量固结的神经机翻译无人监测

获取原文

摘要

This work presents our ongoing research of unsupervised pretraining in neural machine translation (NMT). In our method, we initialize the weights of the encoder and decoder with two language models that are trained with monolingual data and then fine-tune the model on parallel data using Elastic Weight Consolidation (EWC) to avoid forgetting of the original language modeling tasks. We compare the regularization by EWC with the previous work that focuses on regularization by language modeling objectives. The positive result is that using EWC with the decoder achieves BLEU scores similar to the previous work. However, the model converges 2-3 times faster and does not require the original unlabeled training data during the fine-tuning stage. In contrast, the regularization using EWC is less effective if the original and new tasks are not closely related. We show that initializing the bidirectional NMT encoder with a left-to-right language model and forcing the model to remember the original left-to-right language modeling task limits the learning capacity of the encoder for the whole bidirectional context.
机译:这项工作提出了对神经机翻译(NMT)中无监督预测的持续研究。在我们的方法中,我们用两种语言模型初始化编码器和解码器的权重,这些模型是用单晶体数据训练的,然后使用弹性权重合并(EWC)对并行数据进行微调模型,以避免忘记原始语言建模任务。我们将EWC的正规化与以前的工作进行比较,这些工作侧重于语言建模目标正规化。肯定结果是,使用EWC与解码器实现类似于先前工作的BLEU分数。但是,该模型将更快地收敛2-3倍,并且在微调阶段期间不需要原始未标记的训练数据。相比之下,如果原始任务和新任务不密切相关,则使用EWC的正则化效果较低。我们显示使用左右语言模型初始化双向NMT编码器并强制模型记住原始的左右语言建模任务限制了整个双向上下文的编码器的学习容量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号