首页> 外文会议>International conference on recent advances in natural language processing >Multi-Lingual Phrase-Based Statistical Machine Translation for Arabic-English
【24h】

Multi-Lingual Phrase-Based Statistical Machine Translation for Arabic-English

机译:基于多语言短语的阿拉伯语-英语统计机器翻译

获取原文

摘要

In this paper, we implement a multilingual Statistical Machine Translation (SMT) system for Arabic-English Translation. Arabic Text can be categorized into standard and dialectal Arabic. These two forms of Arabic differ significantly. Different mono-lingual and multi-lingual hybrid SMT approaches are compared. Mono-lingual systems do always result in better translation accuracy in one Arabic form and poor accuracy in the other. Multi-lingual SMT models that are trained with pooled parallel MSA/dialectal data result in better accuracy. However, since the available parallel MSA data are much larger compared to dialectal data, multilingual models are biased to MSA. We propose in the work, a multi-lingual combination of different mono-lingual systems using an Arabic form classifier. The outcome of the classier directs the system to use the appropriate mono-lingual models (standard, dialectal, or mixture). Testing the different SMT systems shows that the proposed classifier-based SMT system outperforms mono-lingual and data-pooled multi-lingual systems.
机译:在本文中,我们为阿拉伯英语翻译实现了多语言统计机器翻译(SMT)系统。阿拉伯文字可以分为标准阿拉伯语和方言阿拉伯语。阿拉伯语的这两种形式存在显着差异。比较了不同的单语言和多语言混合SMT方法。单语系统总是会导致一种阿拉伯语形式的翻译准确性更高,而另一种阿拉伯语形式的准确性差。使用合并的并行MSA /方言数据训练的多语言SMT模型可以提高准确性。但是,由于与方言数据相比,可用的并行MSA数据要大得多,因此多语言模型倾向于MSA。我们在工作中提出了使用阿拉伯语形式分类器的不同单语言系统的多语言组合。分类员的结果指示系统使用适当的单语言模型(标准,方言或混合语言)。测试不同的SMT系统表明,所提出的基于分类器的SMT系统优于单语言和数据池多语言系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号