...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Multi-Source Neural Machine Translation With Missing Data
【24h】

Multi-Source Neural Machine Translation With Missing Data

机译:具有缺失数据的多源神经电脑翻译

获取原文
获取原文并翻译 | 示例
           

摘要

Machine translation is rife with ambiguities in word ordering and word choice, and even with the advent of machine-learning methods that learn to resolve this ambiguity based on statistics from large corpora, mistakes are frequent. Multi-source translation is an approach that attempts to resolve these ambiguities by exploiting multiple inputs (e.g. sentences in three different languages) to increase translation accuracy. These methods are trained on multilingual corpora, which include the multiple source languages and the target language, and then at test time uses information from both source languages while generating the target. While there are many of these multilingual corpora, such as multilingual translations of TED talks or European parliament proceedings, in practice, many multilingual corpora are not complete due to the difficulty to provide translations in all of the relevant languages. Existing studies on multi-source translation did not explicitly handle such situations, and thus are only applicable to complete corpora that have all of the languages of interest, severely limiting their practical applicability. In this article, we examine approaches for multi-source neural machine translation (NMT) that can learn from and translate such incomplete corpora. Specifically, we propose methods to deal with incomplete corpora at both training time and test time. For training time, we examine two methods: (1) a simple method that simply replaces missing source translations with a special NULL symbol, and (2) a data augmentation approach that fills in incomplete parts with source translations created from multi-source NMT. For test-time, we examine methods that use multi-source translation even when only a single source is provided by first translating into an additional auxiliary language using standard NMT, then using multi-source translation on the original source and this generated auxiliary language sentence. Extensive experiments demonstrate that the proposed training-time and test-time methods both significantly improve translation performance.
机译:机器翻译是邪恶的单词排序和单词选择中的含糊不利,即使在机器学习方法的出现时,学习基于大公司的统计数据解决这个模糊,错误频繁。多源翻译 是一种尝试通过利用多个输入来解决这些歧义的方法(例如,三种不同语言的句子)来提高转换准确性。这些方法在多语言语料库上培训,包括多种源语言和目标语言,然后在测试时间使用来自两个源语言的信息在生成目标时。虽然有许多这些多语言的语言,例如TED会谈或欧洲议会程序的多语种翻译,但在实践中,由于难以提供翻译,许多多语言的小组并不完整所有 相关语言。关于多源翻译的现有研究没有明确处理这种情况,因此仅适用于拥有所有兴趣语言的完整基石,严重限制了他们的实际适用性。在本文中,我们研究了可以学习的多源神经机翻译(NMT)的方法,并转换此类不完整的语料库。具体而言,我们提出了在培训时间和测试时间处理不完整的Corpora的方法。对于培训时间,我们检查了两种方法:(1)简单的方法,简单地用特殊的空符号替换缺失的源翻译,以及(2)一个数据增强方法,其中包含从多源NMT创建的源翻译的不完整部分。对于测试时间,我们检查使用多源转换的方法,即使仅通过使用标准NMT转换为额外的辅助语言,然后使用原始源和此生成的辅助语言句子将提供多源。广泛的实验表明,所提出的培训时间和测试时间方法都显着提高了翻译性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号