【24h】

Guiding Teacher Forcing with Seer Forcing for Neural Machine Translation

机译:指导教师强迫看到神经电脑翻译

获取原文

摘要

Although teacher forcing has become the main training paradigm for neural machine translation, it usually makes predictions only conditioned on past information, and hence lacks global planning for the future. To address this problem, we introduce another decoder, called seer decoder, into the encoder-decoder framework during training, which involves future information in target predictions. Meanwhile, we force the conventional decoder to simulate the behaviors of the seer decoder via knowledge distillation. In this way, at test the conventional decoder can perform like the seer decoder without the attendance of it. Experiment results on the Chinese-English, English-German and English-Romanian translation tasks show our method can outperform competitive baselines significantly and achieves greater improvements on the bigger data sets. Besides, the experiments also prove knowledge distillation the best way to transfer knowledge from the seer decoder to the conventional decoder compared to adversarial learning and L2 regularization.
机译:虽然教师强迫已成为神经机翻译的主要培训范例,但它通常只能在过去的信息上进行预测,因此缺乏未来的全球规划。为了解决这个问题,我们在训练期间将另一个被称为Seer解码器的解码器介绍到编码器 - 解码器框架中,这涉及目标预测中的未来信息。同时,我们强制传统的解码器通过知识蒸馏模拟Seer解码器的行为。以这种方式,在测试时,传统的解码器可以在没有参加的情况下像Seer解码器一样执行。实验结果对中英文,英语和英语 - 罗马尼亚翻译任务显示我们的方法可以显着优于竞争力的基线,并在更大的数据集上实现更大的改进。此外,实验还证明了知识蒸馏,与对抗的学习和L2正规化相比,从Seer解码器转移到传统解码器的最佳方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号