Deconstructing Nuggets: The Stability and Reliability of Complex Question Answering Evaluation

机译：解构掘金掘金：复杂问题应答评估的稳定性和可靠性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A methodology based on “information nuggets” has recently emerged as the de facto standard by which answers to complex questions are evaluated. After several implementations in the TREC question answering tracks, the community has gained a better understanding of its many characteristics. This paper focuses on one particular aspect of the evaluation: the human assignment of nuggets to answer strings, which serves as the basis of the F-score computation. As a byproduct of the TREC 2006 ciQA task, identical answer strings were independently evaluated twice, which allowed us to assess the consistency of human judgments. Based on these results, we explored simulations of assessor behavior that provide a method to quantify scoring variations. Understanding these variations in turn lets researchers be more confident in their comparisons of systems.

机译：基于“信息掘金掘金”的方法最近被赋予了对复杂问题的答案进行了评估的事实标准。在TREC问题回答轨道的几次实施之后，社区获得了更好地了解其许多特征。本文重点介绍了评估的一个特定方面：掘金的人类分配来回答字符串，其作为F刻度计算的基础。作为TREC 2006 CIQA任务的副产物，相同的答案字符串是独立评估两次的，这使我们能够评估人类判断的一致性。根据这些结果，我们探讨了评估员行为的模拟，提供了一种量化评分变化的方法。反过来了解这些变化让研究人员对系统的比较更加自信。

著录项

来源
《Annual International ACM SIGIR Conference on Research and Development in Information Retrieval》|2007年||共8页
会议地点
作者
Jimmy Lin; Pengyi Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类情报检索;
关键词
TREC; complex information needs; human judgments;

机译：TREC;复杂的信息需求;人类判断;

相似文献

外文文献
中文文献
专利

1. Complex Adaptive Systems Theory and Inter-Rater Reliability: Proposed Answers to Challenging Questions [J] . Taylor-Swanson Lisa, Prasad Tanuja, Conboy Lisa The journal of alternative and complementary medicine: research on paradigm, practice, and policy . 2019,第11期

机译：复杂的自适应系统理论和帧间间可靠性：提出挑战性问题的答案
2. On the Reliability of Factoid Question Answering Evaluation [J] . TETSUYA SAKAI ACM transactions on Asian language information processing . 2007,第1期

机译：Factoid问题回答评估的可靠性
3. Methods for automatically evaluating answers to complex questions [J] . Jimmy Lin, Dina Demner-Fushman Information retrieval . 2006,第5期

机译：自动评估复杂问题答案的方法
4. Deconstructing Nuggets: The Stability and Reliability of Complex Question Answering Evaluation [C] . Jimmy Lin, Pengyi Zhang . 2007

机译：解构金块：复杂问答评估的稳定性和可靠性
5. Answering Complex Questions Using Curated and Extracted Knowledge Bases [D] . Bhutani, Nikita. 2019

机译：使用策划和提取知识库回答复杂问题
6. A question–answer pair (QAP) database integrated with websites to answer complex questions submitted to the Regional Medicines Information and Pharmacovigilance Centres in Norway (RELIS): a descriptive study [O] . Jan Schjøtt, Linda A Reppe, Pål-Didrik H Roland, 2012

机译：与网站集成的问答数据库（QAP）用于回答提交给挪威地区药品信息和药物警戒中心（RELIS）的复杂问题：描述性研究
7. A Note on the Reliability of Japanese Question Answering Evaluation [O] . Sakai Tetsuya 2004

机译：关于日语问答评估可靠性的注意事项
8. Answering Questions, Questioning Answers: Evaluating Data Quality in an Establishment Survey [R] . Goldenberg, K. L. 2008

机译：回答问题，质疑答案：评估企业调查中的数据质量

Deconstructing Nuggets: The Stability and Reliability of Complex Question Answering Evaluation

摘要

著录项

相似文献

相关主题

期刊订阅