Achieving Accurate Conclusions in Evaluation of Automatic Machine Translation Metrics

机译：在评估自动机器翻译指标时获得准确的结论

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic Machine Translation metrics, such as Bleu, are widely used in empirical evaluation as a substitute for human assessment. Subsequently, the performance of a given metric is measured by its strength of correlation with human judgment. When a newly proposed metric achieves a stronger correlation over that of a baseline, it is important to take into account the uncertainty inherent in correlation point estimates prior to concluding improvements in metric performance. Confidence intervals for correlations with human judgment are rarely reported in metric evaluations, however, and when they have been reported, the most suitable methods have unfortunately not been applied. For example, incorrect assumptions about correlation sampling distributions made in past evaluations risk over-estimation of significant differences in metric performance. In this paper, we provide analysis of each of the issues that may lead to inaccuracies before providing detail of a method that overcomes previous challenges. Additionally, we propose a new method of translation sampling that in contrast achieves genuine high conclusivity in evaluation of the relative performance of metrics.

机译：自动机器翻译指标（例如Bleu）广泛用于经验评估中，以代替人工评估。随后，通过给定指标与人类判断的相关强度来衡量其性能。当新提出的指标实现了比基线更强的相关性时，重要的是在结束指标性能改进之前，应考虑到相关点估计中固有的不确定性。与人的判断的相关性的置信区间很少在度量评估中报告，但是，当它们被报告时，不幸的是没有应用最合适的方法。例如，在过去的评估中对相关抽样分布做出的不正确假设可能会高估指标性能的显着差异。在本文中，我们将对可能导致错误的每个问题进行分析，然后再提供克服先前挑战的方法的详细信息。此外，我们提出了一种新的翻译采样方法，相比之下，在评估指标的相对性能时，可以实现真正的高度隐蔽性。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2016年|1-10|共10页
会议地点
作者
Yvette Graham; Qun Liu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. STD: An Automatic Evaluation Metric for Machine Translation Based on Word Embeddings [J] . Li Pairui, Chen Chuan, Zheng Wujie, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019,第10期

机译：STD：基于词嵌入的机器翻译自动评估指标
2. Detecting errors in machine translation using residuals and metrics of automatic evaluation [J] . Munk Michal, Munkova Dasa Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2018,第5期

机译：使用残差和自动评估机器的机器翻译中的错误
3. Significance tests of automatic machine translation evaluation metrics [J] . Ying Zhang, Stephan Vogel Machine translation . 2010,第1期

机译：自动机器翻译评估指标的意义测试
4. Achieving Accurate Conclusions in Evaluation of Automatic Machine Translation Metrics [C] . Yvette Graham, Qun Liu Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2016

机译：在评估自动机器翻译指标中实现准确的结论
5. An investigation of the relationship between automated Machine Translation Evaluation metrics and user performance on an information extraction task. [D] . Tate, Calandra Rilette. 2007

机译：对自动机器翻译评估指标与信息提取任务上的用户性能之间的关系的调查。
6. Achieving Accurate Automatic Sleep Staging on Manually Pre-processed EEG Data Through Synchronization Feature Extraction and Graph Metrics [O] . Panteleimon Chriskos, Christos A. Frantzidis, Polyxeni T. Gkivogkli, 2018

机译：通过同步特征提取和图形指标在手动预处理的EEG数据上实现准确的自动睡眠分级
7. Achieving Accurate Conclusions in Evaluation of Automatic Machine Translation Metrics [O] . Yvette Graham, Qun Liu 2016

机译：在评估自动机器翻译指标中实现准确的结论

Achieving Accurate Conclusions in Evaluation of Automatic Machine Translation Metrics

摘要

著录项

相似文献

相关主题

期刊订阅