首页> 外文会议>Workshop on discourse in machine translation >Data Augmentation using Back-translation for Context-aware Neural Machine Translation
【24h】

Data Augmentation using Back-translation for Context-aware Neural Machine Translation

机译:数据增强使用后翻版内容感知神经机翻译

获取原文

摘要

a single sentence does not always convey information required to translate it into other languages: we sometimes need to add or specialize words that are omitted or ambiguous in the source languages (e.g.. zero pronouns in translating Japanese to English or epicene pronouns in translating English to French). To translate such ambiguous sentences, we exploit contexts around the source sentence, and have so far explored context-aware neural machine translation (NMT). However, a large amount of parallel corpora is not easily available to train accurate context-aware NMT models. In this study, we first obtain large-scale pseudo parallel corpora by back-translating target-side monolingual corpora, and then investigate its impact on the translation performance of context-aware NMT models. We evaluate NMT models trained with small parallel corpora and the large-scale pseudo parallel corpora on IWSLT2017 English-Japanese and English-French datasets, and demonstrate the large impact of the data augmentation for context-aware NMT models in terms of bleu score and specialized test sets on ja→en~1 and fr→en.
机译:单句并不总是传达信息,把它翻译成其他语言要求:我们有时需要添加或专注的是在源语言的日语翻译为英语或通性代词在翻译英语省略或模糊(如零个代词的话。法语)。要翻译这样的歧义句,我们利用周围的光源句子上下文,并迄今已探索了环境感知神经机器翻译(NMT)。然而,大量的平行语料库是不容易得到训练精确的上下文感知NMT模型。在这项研究中,我们首先回到转译目标端的单语获得大规模伪平行语料库语料库,并探讨其对上下文感知NMT模型的翻译性能的影响。我们评估与小平行语料库和IWSLT2017英语,日语,英语,法语数据集大型伪平行语料训练的NMT模型,并演示了上下文感知NMT模型数据增强的布鲁得分方面大的影响力和专业测试集上JA→恩〜1和FR→连接。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号