首页> 外文会议>Annual International ACM SIGIR Conference on Research and Development in Information Retrieval >Deconstructing Nuggets: The Stability and Reliability of Complex Question Answering Evaluation
【24h】

Deconstructing Nuggets: The Stability and Reliability of Complex Question Answering Evaluation

机译:解构掘金掘金:复杂问题应答评估的稳定性和可靠性

获取原文

摘要

A methodology based on “information nuggets” has recently emerged as the de facto standard by which answers to complex questions are evaluated. After several implementations in the TREC question answering tracks, the community has gained a better understanding of its many characteristics. This paper focuses on one particular aspect of the evaluation: the human assignment of nuggets to answer strings, which serves as the basis of the F-score computation. As a byproduct of the TREC 2006 ciQA task, identical answer strings were independently evaluated twice, which allowed us to assess the consistency of human judgments. Based on these results, we explored simulations of assessor behavior that provide a method to quantify scoring variations. Understanding these variations in turn lets researchers be more confident in their comparisons of systems.
机译:基于“信息掘金掘金”的方法最近被赋予了对复杂问题的答案进行了评估的事实标准。在TREC问题回答轨道的几次实施之后,社区获得了更好地了解其许多特征。本文重点介绍了评估的一个特定方面:掘金的人类分配来回答字符串,其作为F刻度计算的基础。作为TREC 2006 CIQA任务的副产物,相同的答案字符串是独立评估两次的,这使我们能够评估人类判断的一致性。根据这些结果,我们探讨了评估员行为的模拟,提供了一种量化评分变化的方法。反过来了解这些变化让研究人员对系统的比较更加自信。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号