A goodness of fit test approach in information retrieval

Kostas Fragos; Yannis Maistros

首页> 外文期刊>Information retrieval >A goodness of fit test approach in information retrieval

【24h】

A goodness of fit test approach in information retrieval

机译：信息检索中的契合度检验方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In many probabilistic modeling approaches to Information Retrieval we are interested in estimating how well a document model "fits" the user's information need (query model). On the other hand in statistics, goodness of fit tests are well established techniques for assessing the assumptions about the underlying distribution of a data set. Supposing that the query terms are randomly distributed in the various documents of the collection, we actually want to know whether the occurrences of the query terms are more frequently distributed by chance in a particular document. This can be quantified by the so-called goodness of fit tests. In this paper, we present a new document ranking technique based on Chi-square goodness of fit tests. Given the null hypothesis that there is no association between the query terms q and the document d irrespective of any chance occurrences, we perform a Chi-square goodness of fit test for assessing this hypothesis and calculate the corresponding Chi-square values. Our retrieval formula is based on ranking the documents in the collection according to these calculated Chi-square values. The method was evaluated over the entire test collection of TREC data, on disks 4 and 5, using the topics of TREC-7 and TREC-8 (50 topics each) conferences. It performs well, outperforming steadily the classical OKAPI term frequency weighting formula but below that of KL-Divergence from language modeling approach. Despite this, we believe that the technique is an important non-parametric way of thinking of retrieval, offering the possibility to try simple alternative retrieval formulas within goodness-of-fit statistical tests' framework, modeling the data in various ways estimating or assigning any arbitrary theoretical distribution in terms.

机译：在许多信息检索的概率建模方法中，我们有兴趣估算文档模型“适合”用户信息需求（查询模型）的程度。另一方面，在统计中，拟合优度检验是用于评估有关数据集基本分布的假设的完善技术。假设查询词随机分布在集合的各个文档中，我们实际上想知道查询词的出现是否在特定文档中偶然地分布得更频繁。这可以通过所谓的拟合优度来量化。在本文中，我们提出了一种基于卡方拟合优度检验的新文档排名技术。给定零假设，查询条件q和文档d之间不存在关联，而不考虑任何偶然事件的发生，我们执行卡方拟合优度检验以评估该假设并计算相应的卡方值。我们的检索公式基于根据这些计算出的卡方值对集合中的文档进行排名。使用TREC-7和TREC-8（各50个主题）会议的主题，在磁盘4和5的TREC数据的整个测试集合中对该方法进行了评估。它表现良好，稳步优于经典的OKAPI术语频率加权公式，但低于语言建模方法的KL-Divergence。尽管如此，我们认为该技术是一种重要的非参数化检索思维方式，提供了在拟合优度统计检验框架内尝试简单的替代检索公式，以各种方式对数据进行建模以估计或分配任何可能性的可能性。就任意理论分布而言。

著录项

来源
《Information retrieval》 |2006年第3期|p.331-342|共12页
作者
Kostas Fragos; Yannis Maistros;
展开▼
作者单位

Department of Electrical and Computer Engineers, National Technical University of Athens Iroon Polytexneniou 9, 15780 Zografou, Greece;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类图书馆学、图书馆事业;
关键词
goodness of fit tests; information retrieval;

机译：拟合优度;信息检索;

相似文献

外文文献
中文文献
专利

1. Comments on: Goodness-of-fit tests in mixed modes Smooth tests of goodness-of-fit for the random effects distribution in linear mixed models [J] . Olivier Thas Test: An Official Journal of the Spanish Society of Statistics and Operations Research . 2009,第2期

机译：评论：混合模式下的拟合优度检验线性混合模型中随机效应分布的拟合优度平滑检验
2. Construction of Statistic Distribution Models for Nonparametric Goodness-of-Fit Tests in Testing Composite Hypotheses: The Computer Approach [J] . Boris Yu. Lemeshko, Stanislav B. Lemeshko Quality Technology and Quantitative Management . 2011,第4期

机译：用于检验复合假设的非参数拟合优度检验的统计分布模型的构建：计算机方法
3. The Meaning of Goodness-of-Fit Tests: Commentary on 'Goodness-of-Fit Assessment of Item Response Theory Models' [J] . David Thissen Measurement . 2013,第1a4期

机译：拟合优度检验的意义：评述“项目反应理论模型的拟合优度评估”
4. A new practical approach to goodness-of-fit test for logistic regression models [C] . Kanyaphorn Hankla, Veeranun Pongsapukdee International Conference on Applied Statistics . 2014

机译：逻辑回归模型良好良好测试的新实用方法
5. A jackknife empirical likelihood approach to goodness of fit U-statistic testing with side information. [D] . Lin, Qun. 2013

机译：附带经验的拟合U统计检验优劣的折刀经验似然法。
6. A basis approach to goodness-of-fit testing in recurrent event models [O] . Ma. Zenia N. Agustin, Edsel A. Peña -1

机译：循环事件模型中拟合优度测试的基本方法
7. Empirical likelihood approach to goodness of fit testing [O] . Hanxiang Peng, Anton Schick 2016

机译：拟合优度检验的经验似然法
8. Innovation Approach to Goodness-of-Fit Tests in R sup m [R] . Khmaladze, E. V. 1987

机译：R supm中的拟合优度测试的创新方法

A goodness of fit test approach in information retrieval

摘要

著录项

相似文献

相关主题

期刊订阅