首页> 外文会议>Annual meeting of the Association for Computational Linguistics;Meeting of the Association for Computational Linguistics >Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish
【24h】

Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish

机译:基于因式短语的统计机器从英语到土耳其语的语法到形态学映射

获取原文

摘要

We present a novel scheme to apply factored phrase-based SMT to a language pair with very disparate morphological structures. Our approach relies on syntactic analysis on the source side (English) and then encodes a wide variety of local and non-local syntactic structures as complex structural tags which appear as additional factors in the training data. On the target side (Turkish), we only perform morphological analysis and disambiguation but treat the complete complex morphological tag as a factor, instead of separating morphemes. We incrementally explore capturing various syntactic substructures as complex tags on the English side, and evaluate how our translations improve in BLEU scores. Our maximal set of source and target side transformations, coupled with some additional techniques, provide an 39% relative improvement from a baseline 17.08 to 23.78 BLEU, all averaged over 10 training and test sets. Now that the syntactic analysis on the English side is available, we also experiment with more long distance constituent reordering to bring the English constituent order close to Turkish, but find that these transformations do not provide any additional consistent tangible gains when averaged over the 10 sets.
机译:我们提出了一种新颖的方案,将基于因式短语的SMT应用于具有非常不同的形态结构的语言对。我们的方法依赖于源代码方面(英语)的语法分析,然后将各种本地和非本地语法结构编码为复杂的结构标签,这些复杂的结构标签在训练数据中作为其他因素出现。在目标方面(土耳其语),我们仅执行形态分析和歧义消除,但将完整的复杂形态标记视为一个因素,而不是分离词素。我们逐步探索捕获各种语法子结构作为英语方面的复杂标记,并评估我们的翻译如何提高BLEU分数。我们的源和目标侧转换的最大集合,再加上一些其他技术,相对于基线17.08到23.78 BLEU,提供了39%的相对改进,所有这些均在10个训练和测试集上平均。现在可以进行英语方面的句法分析,我们还尝试了更长距离的成分重排,以使英语成分顺序接近土耳其语,但是发现这些转换在10组中取平均值时不会提供任何其他一致的有形收益。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号