首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Selecting Syntactic, Non-redundant Segments in Active Learning for Machine Translation
【24h】

Selecting Syntactic, Non-redundant Segments in Active Learning for Machine Translation

机译:在主动学习中选择句法,非冗余句段进行机器翻译

获取原文

摘要

Active learning is a framework that makes it possible to efficiently train statistical models by selecting informative examples from a pool of unlabeled data. Previous work has found this framework effective for machine translation (MT), making it possible to train better translation models with less effort, particularly when annotators translate short phrases instead of full sentences. However, previous methods for phrase-based active learning in MT fail to consider whether the selected units are coherent and easy for human translators to translate, and also have problems with selecting redundant phrases with similar content. In this paper, we tackle these problems by proposing two new methods for selecting more syntactically coherent and less redundant segments in active learning for MT. Experiments using both simulation and extensive manual translation by professional translators find the proposed method effective, achieving both greater gain of BLEU score for the same number of translated words, and allowing translators to be more confident in their translations.
机译:主动学习是一个框架,通过从未标记的数据池中选择信息丰富的示例,可以有效地训练统计模型。先前的工作发现此框架可有效用于机器翻译(MT),从而可以以更少的精力来训练更好的翻译模型,尤其是当注释者翻译短短语而不是完整句子时。然而,用于MT中基于短语的主动学习的先前方法无法考虑所选单元是否连贯且易于人类翻译者进行翻译,并且在选择具有相似内容的冗余短语时也存在问题。在本文中,我们通过提出两种新的方法来解决这些问题,这些新方法可以在MT的主动学习中选择语法上更一致且冗余更少的段。使用模拟和专业翻译人员进行大量人工翻译的实验发现,该方法是有效的,对于相同数量的翻译单词,BLEU得分均获得了更大的收益,并使翻译人员对其翻译更加自信。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号