...
首页> 外文期刊>Information retrieval >Partial Collection Replication for Information Retrieval
【24h】

Partial Collection Replication for Information Retrieval

机译:部分集合复制以进行信息检索

获取原文
获取原文并翻译 | 示例
           

摘要

The explosion of content in distributed information retrieval (IR) systems requires new mechanisms in order to attain timely and accurate retrieval of unstructured text. This paper shows how to exploit locality by building, using, and searching partial replicas of text collections in a distributed IR system. In this work, a partial replica includes a subset of the documents from larger collection(s) and the corresponding inference network search mechanism. For each query, the distributed system determines if partial replica is a good match and then searches it, or it searches the original collection. We demonstrate the scenarios where partial replication performs better than systems that use caches which only store previous query and answer pairs. We first use logs from THOMAS and Excite to examine query locality using query similarity versus exact match. We show that searching replicas can improve locality (from 3 to 19%) over the exact match required by caching. Replicas increase locality because they satisfy queries which are distinct but return the same or very similar answers. We then present a novel inference network replica selection function. We vary its parameters and compare it to previous collection selection functions, demonstrating a configuration that directs most of the appropriate queries to replicas in a replica hierarchy. We then explore the performance of partial replication in a distributed IR system. We compare it with caching and partitioning. Our validated simulator shows that the increases in locality due to replication make it preferable to caching alone, and that even a small increase of 4% in locality translates into a performance advantage. We also show a hybrid system with caches and replicas that performs better than each on their own.
机译:分布式信息检索(IR)系统中内容的爆炸式增长需要新的机制,以便及时,准确地检索非结构化文本。本文展示了如何通过在分布式IR系统中构建,使用和搜索文本集合的部分副本来利用局部性。在这项工作中,部分副本包括来自较大集合的文档的子集和相应的推理网络搜索机制。对于每个查询,分布式系统都会确定部分副本是否匹配良好,然后对其进行搜索,或者搜索原始集合。我们演示了以下情形:部分复制比使用仅存储先前查询和答案对的缓存的系统性能更好。我们首先使用THOMAS和Excite的日志通过查询相似度与精确匹配来检查查询局部性。我们表明,搜索副本可以通过缓存所需的完全匹配来提高局部性(从3%到19%)。副本增加了局部性,因为它们满足截然不同但返回相同或非常相似答案的查询。然后,我们提出一种新颖的推理网络副本选择功能。我们改变其参数并将其与以前的集合选择功能进行比较,展示了一种配置,该配置将大多数适当的查询定向到副本层次结构中的副本。然后,我们探讨了分布式IR系统中部分复制的性能。我们将其与缓存和分区进行比较。我们经过验证的仿真器表明,由于复制而导致的局部性增加使得它比单独使用缓存更可取,即使局部性小幅增加4%也可以转化为性能优势。我们还展示了一个具有缓存和副本的混合系统,它们的性能要优于它们各自的缓存和副本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号