首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Achieving Accurate Conclusions in Evaluation of Automatic Machine Translation Metrics
【24h】

Achieving Accurate Conclusions in Evaluation of Automatic Machine Translation Metrics

机译:在评估自动机器翻译指标时获得准确的结论

获取原文

摘要

Automatic Machine Translation metrics, such as Bleu, are widely used in empirical evaluation as a substitute for human assessment. Subsequently, the performance of a given metric is measured by its strength of correlation with human judgment. When a newly proposed metric achieves a stronger correlation over that of a baseline, it is important to take into account the uncertainty inherent in correlation point estimates prior to concluding improvements in metric performance. Confidence intervals for correlations with human judgment are rarely reported in metric evaluations, however, and when they have been reported, the most suitable methods have unfortunately not been applied. For example, incorrect assumptions about correlation sampling distributions made in past evaluations risk over-estimation of significant differences in metric performance. In this paper, we provide analysis of each of the issues that may lead to inaccuracies before providing detail of a method that overcomes previous challenges. Additionally, we propose a new method of translation sampling that in contrast achieves genuine high conclusivity in evaluation of the relative performance of metrics.
机译:自动机器翻译指标(例如Bleu)广泛用于经验评估中,以代替人工评估。随后,通过给定指标与人类判断的相关强度来衡量其性能。当新提出的指标实现了比基线更强的相关性时,重要的是在结束指标性能改进之前,应考虑到相关点估计中固有的不确定性。与人的判断的相关性的置信区间很少在度量评估中报告,但是,当它们被报告时,不幸的是没有应用最合适的方法。例如,在过去的评估中对相关抽样分布做出的不正确假设可能会高估指标性能的显着差异。在本文中,我们将对可能导致错误的每个问题进行分析,然后再提供克服先前挑战的方法的详细信息。此外,我们提出了一种新的翻译采样方法,相比之下,在评估指标的相对性能时,可以实现真正的高度隐蔽性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号