首页> 外文期刊>Information Processing & Management >Index ordering by query-independent measures
【24h】

Index ordering by query-independent measures

机译:通过与查询无关的措施对索引进行排序

获取原文
获取原文并翻译 | 示例
           

摘要

Conventional approaches to information retrieval search through all applicable entries in an inverted file for a particular collection in order to find those documents with the highest scores. For particularly large collections this may be extremely time consuming. A solution to this problem is to only search a limited amount of the collection at querytime, in order to speed up the retrieval process. In doing this we can also limit the loss in retrieval efficacy (in terms of accuracy of results). The way we achieve this is to firstly identify the most "important" documents within the collection, and sort documents within inverted file lists in order of this "importance". In this way we limit the amount of information to be searched at query time by eliminating documents of lesser importance, which not only makes the search more efficient, but also limits loss in retrieval accuracy. Our experiments, carried out on the TREC Terabyte collection, report significant savings, in terms of number of postings examined, without significant loss of effectiveness when based on several measures of importance used in isolation, and in combination. Our results point to several ways in which the computation cost of searching large collections of documents can be significantly reduced.
机译:信息检索的常规方法是搜索反向文件中所有特定条目的特定集合,以查找得分最高的那些文档。对于特别大的收藏,这可能会非常耗时。解决此问题的方法是在查询时仅搜索有限数量的集合,以加快检索过程。通过这样做,我们还可以限制检索效率的损失(就结果的准确性而言)。我们实现这一目标的方法是,首先确定集合中最“重要”的文档,然后按照这种“重要性”的顺序对反向文件列表中的文档进行排序。这样,通过消除重要性较低的文档,我们限制了查询时要搜索的信息量,这不仅使搜索效率更高,而且还限制了检索准确性的损失。我们的实验是在TREC Terabyte集合上进行的,根据所审查的发布数量,发现了可观的节省,而当基于单独使用和组合使用的几种重要衡量指标时,其有效性没有显着下降。我们的结果指出了可以大幅度减少搜索大量文档的计算成本的几种方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号