首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Selecting Syntactic, Non-redundant Segments in Active Learning for Machine Translation
【24h】

Selecting Syntactic, Non-redundant Segments in Active Learning for Machine Translation

机译:在主动学习中选择语法,非冗余段的机器翻译

获取原文

摘要

Active learning is a framework that makes it possible to efficiently train statistical models by selecting informative examples from a pool of unlabeled data. Previous work has found this framework effective for machine translation (MT), making it possible to train better translation models with less effort, particularly when annotators translate short phrases instead of full sentences. However, previous methods for phrase-based active learning in MT fail to consider whether the selected units are coherent and easy for human translators to translate, and also have problems with selecting redundant phrases with similar content. In this paper, we tackle these problems by proposing two new methods for selecting more syntactically coherent and less redundant segments in active learning for MT. Experiments using both simulation and extensive manual translation by professional translators find the proposed method effective, achieving both greater gain of BLEU score for the same number of translated words, and allowing translators to be more confident in their translations.
机译:主动学习是一种框架,可以通过从未标记的数据池中选择信息实例来有效地培训统计模型。以前的工作已经发现此框架对机器翻译(MT)有效,使得可以使用更少的努力训练更好的翻译模型,特别是当注释器翻译短语而不是完整的句子时。然而,在MT中的基于短语的主动学习方法无法考虑所选单元是否是连贯的,并且对于人类转换器来说是连贯的,并且还具有选择具有类似内容的冗余短语的问题。在本文中,我们通过提出两种新方法来解决这些问题,用于在主动学习中选择更多的语法相干和更少的冗余段。专业翻译的模拟和广泛手动翻译的实验发现提出的方法有效,实现了相同数量的翻译词的BLEU评分的更大增益,并允许翻译人员在翻译中更有信心。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号