首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Learning Deep Transformer Models for Machine Translation
【24h】

Learning Deep Transformer Models for Machine Translation

机译:学习机器翻译的深变压器模型

获取原文

摘要

Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for the development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter. We claim that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next. On WMT 16 English-German, NIST OpenMT'12 Chinese-English and larger WMT'18 Chinese-English tasks, our deep system (30/25-layer encoder) outperforms the shallow Transformer-Big/Base baseline (6-layer encoder) by 0.4~2.4 BLEU points. As another bonus, the deep model is 1.6X smaller in size and 3X faster in training than Transformer-Big~1.
机译:变压器是最近的机器翻译评估中的最先进的模型。两条股有希望改善这种模式:第一个使用广泛的网络(AKA变压器 - 大),并一直是变压器系统开发的事实标准,另一个使用更深入的语言表示,但面临难度来自学习深网络的影响。在这里,我们继续对后者的研究线。我们声称,真正深的变压器模型可以超越变压器 - 大对应物1)正确使用层标准化和2)将前一层与下一个组合的新方法。在WMT 16英语 - 德语,NIST Openmt'12中文 - 英语和更大的WMT'18中英语任务,我们的深度系统(30/25层编码器)优于浅变压器 - 大/基础基线(6层编码器) 0.4〜2.4 BLEU积分。作为另一个奖励,深层模型的尺寸小于1.6倍,培训比变压器 - 大〜1更快3倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号