Relevance Judgments Exclusive of Human Assessors in Large Scale Information Retrieval Evaluation Experimentation

Prabha Rajagopal; Sri Devi Ravana; Maizatul Akmar Ismail

首页> 外文期刊>Malaysian Journal of Computer Science >Relevance Judgments Exclusive of Human Assessors in Large Scale Information Retrieval Evaluation Experimentation

【24h】

Relevance Judgments Exclusive of Human Assessors in Large Scale Information Retrieval Evaluation Experimentation

机译：大规模信息检索评估实验中不包含人类评估者的相关性判断

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Inconsistent judgments by various human assessors?compromises the reliability of the relevance judgments generated for large scale test collections. An automated method that creates a similar set of relevance judgments (pseudo relevance judgments) that eliminate the human efforts and errors introduced in creating relevance judgments is investigated in this study. Traditionally, the participating systems in TREC are measured by using a chosen metrics and ranked according to its performance scores. In order to generate these scores, the documents retrieved by these systems for each topic are matched with the set of relevance judgments (often assessed by humans). In this study, the number of occurrences of each document per topic from the various runs will be used with an assumption, the higher the number of occurrences of a document, the possibility of the document being relevant is higher. The study proposesa method with a pool depth of 100 using the cutoff percentage of >35% that could provide an alternate way of generating consistent relevance judgments without the involvement of human assessors.

机译：各种评估人员的判断不一致会损害为大型测试集合生成的相关判断的可靠性。在这项研究中，研究了一种自动方法，该方法创建了一组相似的相关性判断（伪相关性判断），从而消除了在创建相关性判断时引入的人工和错误。传统上，TREC中的参与系统是通过使用选定的指标进行衡量的，并根据其性能得分进行排名。为了生成这些分数，将这些系统为每个主题检索的文档与相关性判断集匹配（通常由人员进行评估）。在这项研究中，每个主题中每个主题的每个文档的出现次数将与假设一起使用，文档的出现次数越高，文档相关的可能性就越高。该研究提出了一种使用100％池深度的方法，该方法使用大于35％的截断百分比，这可以提供另一种方式来生成一致的相关性判断，而无需人工评估。

著录项

来源
《Malaysian Journal of Computer Science》 |2014年第2期|共页
作者
Prabha Rajagopal; Sri Devi Ravana; Maizatul Akmar Ismail;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类情报学、情报工作;
关键词

相似文献

外文文献
中文文献
专利

1. Relevance Judgments Exclusive of Human Assessors in Large Scale Information Retrieval Evaluation Experimentation [J] . Maizatul Akmar Ismail, Prabha Rajagopal, Sri Devi Ravana Malaysian Journal of Computer Science . 2014,第2期

机译：大规模信息检索评估实验中不包含人类评估者的相关性判断
2. Creation of Reliable Relevance Judgments in Information Retrieval Systems Evaluation Experimentation through Crowdsourcing: A Review [J] . ParniaSamimi, Sri DeviRavana ScientificWorldJournal . 2014,第3期

机译：通过众包创建信息检索系统评估实验中可靠的相关性判断：综述
3. Evaluating the effectiveness of information retrieval systems using effort-based relevance judgment [J] . Rajagopal Prabha, Ravana Sri Devi, Koh Yun Sing, Aslib Proceedings . 2019,第1期

机译：使用基于工作量的相关性判断评估信息检索系统的有效性
4. Agreement between Crowdsourced Workers and Expert Assessors in Making Relevance Judgment for System Based IR Evaluation [C] . Parnia Samimi, Sri Devi Ravana International Conference on Soft Computing and Data Mining . 2014

机译：众群工人与专家评估员之间的协议，对基于系统的IR评估相关判断
5. Optimally Selecting and Combining Assessment and Assessor Types for Information Retrieval Evaluation [D] . Bashir, Maryam 2014

机译：信息检索评估的评估和评估者类型的最佳选择和组合
6. Creation of Reliable Relevance Judgments in Information Retrieval Systems Evaluation Experimentation through Crowdsourcing: A Review [O] . Parnia Samimi, Sri Devi Ravana -1

机译：通过众包在信息检索系统评估实验中建立可靠的相关性评述
7. On the effectiveness of evaluating retrieval systems in the absence of relevance judgments [O] . Javed A. Aslam 2003

机译：在没有相关性判断的情况下评估检索系统的有效性
8. Experimental Design for Measuring the Intra- and Inter-Group Consistency of Human Judgment of Relevance [R] . Hoffman, J. M. 1965

机译：测量人类相关性判断的组内和组间一致性的实验设计

Relevance Judgments Exclusive of Human Assessors in Large Scale Information Retrieval Evaluation Experimentation

摘要

著录项

相似文献

相关主题

期刊订阅