首页> 外文会议>People transforming information – information transforming people >Large-Scale Multiple Hypothesis Testing in Information Retrieval: Towards a new approach to Document Ranking
【24h】

Large-Scale Multiple Hypothesis Testing in Information Retrieval: Towards a new approach to Document Ranking

机译:信息检索中的大规模多重假设检验:一种新的文档排名方法

获取原文
获取原文并翻译 | 示例

摘要

Information retrieval (IR) may be considered an instance of a common modernrnstatistical problem: a massive simultaneous hypothesis test. Such problemsrnarise often in biostatistics where plentiful data must be winnowed to name arnsmall number of potentially " interesting" cases. For instance, DNA microarrayrnanalysis requires researchers to filter thousands of genes, searching for genesrnimplicated in a particular condition. This paper describes a novel approach to IRrnthat is based on the notion of simultaneous hypothesis testing. In this case therntest is performed on each document and the null hypothesis is that the documentrnis non-relevant. After a mathematical derivation of the proposed model, we testrnits performance on three standard data sets against the effectiveness of twornbaseline IR systems, a vector space model and a language modeling-basedrnsystem. These preliminary experiments show that the hypothesis testingrnapproach to IR is not only philosophically appealing, but that it also operates atrnthe state of the art in effectiveness.
机译:信息检索(IR)可以被视为一个常见的现代统计学问题的实例:大规模同时假设检验。这种问题通常在生物统计学中引起争议,在生物统计学中,必须对大量数据进行筛选,以列出少量的潜在“有趣”案例。例如,DNA微阵列分析需要研究人员过滤成千上万的基因,寻找与特定条件相关的基因。本文介绍了一种基于同时假设检验概念的IRrn新方法。在这种情况下,对每个文档进行测试,并且无效假设是文档无关。在对所提出的模型进行数学推导之后,我们针对两个基线IR系统,矢量空间模型和基于语言建模的系统的有效性,在三个标准数据集上测试了其性能。这些初步实验表明,对IR的假设检验方法不仅在哲学上具有吸引力,而且还可以在现有技术水平上发挥作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号