...
首页> 外文期刊>Information retrieval >On Collection Size and Retrieval Effectiveness
【24h】

On Collection Size and Retrieval Effectiveness

机译:论馆藏规模和检索有效性

获取原文
           

摘要

The relationship between collection size and retrieval effectiveness is particularly important in the context of Web search. We investigate it first analytically and then experimentally, using samples and subsets of test collections. Different retrieval systems vary in how the score assigned to an individual document in a sample collection relates to the score it receives in the full collection; we identify four cases. We apply signal detection (SD) theory to retrieval from samples, taking into account the four cases and using a variety of shapes for relevant and irrelevant distributions. We note that the SD model subsumes several earlier hypotheses about the causes of the decreased precision in samples. We also discuss other models which contribute to an understanding of the phenomenon, particularly relating to the effects of discreteness. Different models provide complementary insights. Extensive use is made of test data, some from official submissions to the TREC-6 VLC track and some new, to illustrate the effects and test hypotheses. We empirically confirm predictions, based on SD theory, that P@n should decline when moving to a sample collection and that average precision and R-precision should remain constant. SD theory suggests the use of recall-fallout plots as operating characteristic (OC) curves. We plot OC curves of this type for a real retrieval system and query set and show that curves for sample collections are similar but not identical to the curve for the full collection.
机译:在Web搜索的上下文中,集合大小与检索效率之间的关系特别重要。我们首先使用样本和测试集合的子集来进行分析,然后进行实验研究。不同的检索系统在分配给样本集合中单个文档的分数与它在完整集合中获得的分数之间的关系方面有所不同。我们确定了四种情况。我们将信号检测(SD)理论应用到样本检索中,同时考虑了四种情况,并使用各种形状进行相关和不相关的分布。我们注意到,SD模型包含了一些关于样本精度降低的原因的较早假设。我们还将讨论有助于理解现象的其他模型,尤其是与离散性影响有关的模型。不同的模型提供了互补的见解。大量使用测试数据,其中一些来自官方提交给TREC-6 VLC的测试数据,而另一些则用于说明影响和测试假设。我们根据SD理论凭经验确认了预测,即在移至样本集合时P @ n应该下降,并且平均精度和R精度应保持恒定。 SD理论建议使用召回分布图作为操作特征(OC)曲线。我们绘制了用于实际检索系统和查询集的此类OC曲线,并显示样本集合的曲线与完整集合的曲线相似但不相同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号