首页> 外文会议>18th ACM conference on information and knowledge management 2009 >Retrieval Constraints and Word Frequency Distributions a Log-Logistic Model for IR
【24h】

Retrieval Constraints and Word Frequency Distributions a Log-Logistic Model for IR

机译:检索约束和词频分布的红外对数逻辑模型

获取原文

摘要

We first present in this paper an analytical view of heuristic retrieval constraints which yields simple tests to determine whether a retrieval function satisfies the constraints or not. We then review empirical findings on word frequency distributions and the central role played by burstiness in this context. This leads us to propose a formal definition of burstiness which can be used to characterize probability distributions wrt this phenomenon. We then introduce the family of information-based IR models which naturally captures heuristic retrieval constraints when the underlying probability distribution is bursty and propose a new IR model within this family, based on the log-logistic distribution. The experiments we conduct on three different collections illustrate the good behavior of the log-logistic IR model: it significantly outperforms the Jelinek-Mercer and Dirichlet prior language models on all three collections, with both short and long queries and for both the MAP and the precision at 10 documents. It also outperforms the InL2 DFR model for the MAP, and yields results on a par with it for the precision at 10.
机译:我们首先在本文中提出启发式检索约束的分析视图,该视图可进行简单的测试来确定检索函数是否满足约束条件。然后,我们回顾关于单词频率分布的经验性发现,以及在这种情况下突发性所起的核心作用。这使我们提出了突发性的正式定义,可用于表征这种现象的概率分布。然后,我们介绍了一系列基于信息的IR模型,当潜在的概率分布是突发性的时,自然地捕获了启发式检索约束,并基于对数逻辑分布在该族中提出了一个新的IR模型。我们在三个不同的集合上进行的实验说明了对数逻辑IR模型的良好行为:在三个查询中,无论是短期查询还是长期查询,对于MAP和MAP而言,它都明显优于Jelinek-Mercer和Dirichlet先前的语言模型。精确度为10个文档。对于MAP,它的性能也优于InL2 DFR模型,并且其精度达到10时,其结果与之相当。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号