Partial Collection Replication for Information Retrieval

ZHIHONG LU; KATHRYN S. MCKINLEY

首页> 外文期刊>Information retrieval >Partial Collection Replication for Information Retrieval

【24h】

Partial Collection Replication for Information Retrieval

机译：部分集合复制以进行信息检索

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The explosion of content in distributed information retrieval (IR) systems requires new mechanisms in order to attain timely and accurate retrieval of unstructured text. This paper shows how to exploit locality by building, using, and searching partial replicas of text collections in a distributed IR system. In this work, a partial replica includes a subset of the documents from larger collection(s) and the corresponding inference network search mechanism. For each query, the distributed system determines if partial replica is a good match and then searches it, or it searches the original collection. We demonstrate the scenarios where partial replication performs better than systems that use caches which only store previous query and answer pairs. We first use logs from THOMAS and Excite to examine query locality using query similarity versus exact match. We show that searching replicas can improve locality (from 3 to 19%) over the exact match required by caching. Replicas increase locality because they satisfy queries which are distinct but return the same or very similar answers. We then present a novel inference network replica selection function. We vary its parameters and compare it to previous collection selection functions, demonstrating a configuration that directs most of the appropriate queries to replicas in a replica hierarchy. We then explore the performance of partial replication in a distributed IR system. We compare it with caching and partitioning. Our validated simulator shows that the increases in locality due to replication make it preferable to caching alone, and that even a small increase of 4% in locality translates into a performance advantage. We also show a hybrid system with caches and replicas that performs better than each on their own.

机译：分布式信息检索（IR）系统中内容的爆炸式增长需要新的机制，以便及时，准确地检索非结构化文本。本文展示了如何通过在分布式IR系统中构建，使用和搜索文本集合的部分副本来利用局部性。在这项工作中，部分副本包括来自较大集合的文档的子集和相应的推理网络搜索机制。对于每个查询，分布式系统都会确定部分副本是否匹配良好，然后对其进行搜索，或者搜索原始集合。我们演示了以下情形：部分复制比使用仅存储先前查询和答案对的缓存的系统性能更好。我们首先使用THOMAS和Excite的日志通过查询相似度与精确匹配来检查查询局部性。我们表明，搜索副本可以通过缓存所需的完全匹配来提高局部性（从3％到19％）。副本增加了局部性，因为它们满足截然不同但返回相同或非常相似答案的查询。然后，我们提出一种新颖的推理网络副本选择功能。我们改变其参数并将其与以前的集合选择功能进行比较，展示了一种配置，该配置将大多数适当的查询定向到副本层次结构中的副本。然后，我们探讨了分布式IR系统中部分复制的性能。我们将其与缓存和分区进行比较。我们经过验证的仿真器表明，由于复制而导致的局部性增加使得它比单独使用缓存更可取，即使局部性小幅增加4％也可以转化为性能优势。我们还展示了一个具有缓存和副本的混合系统，它们的性能要优于它们各自的缓存和副本。

著录项

来源
《Information retrieval》 |2003年第2期|p.159-198|共40页
作者
ZHIHONG LU; KATHRYN S. MCKINLEY;
展开▼
作者单位

AT&T Laboratories, 200 Laurel Avenue, Middletown, New Jersey 07748, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类图书馆学、图书馆事业;
关键词
partial replication; replica selection; distributed information retrieval architectures;

机译：部分复制;副本选择;分布式信息检索架构;

相似文献

外文文献
中文文献
专利

1. Understanding the Retrieval Effectiveness of Collaborative Tags and Author Keywords in Different Retrieval Environments: An Experimental Study on Medical Collections [J] . Kun Lu, Margaret E.I. Kipp Journal of the American Society for Information Science and Technology . 2014,第3期

机译：了解不同检索环境中协作标签和作者关键词的检索有效性：医学文献的实验研究
2. Replication process of the parvovirus H-1. VIII. Partial denaturation mapping and localization of the replication origin of H-1 replicative-form DNA with electron microscopy. [J] . I I Singer, S L Rhode Journal of Virology . 1977,第2期

机译：Parvovirus H-1的复制过程。 VIII。用电子显微镜局部变性映射和H-1重复形式DNA的复制起源的定位。
3. Replication process of the parvovirus H-1. VIII. Partial denaturation mapping and localization of the replication origin of H-1 replicative-form DNA with electron microscopy. [J] . I I Singer, S L Rhode Journal of Virology . 1977,第2期

机译：Parvovirus H-1的复制过程。 VIII。用电子显微镜局部变性映射和H-1重复形式DNA的复制起源的定位。
4. Partial collection replication versus caching for information retrieval systems [C] . Zhihong Lu, Kathryn S. McKinley Annual international ACM SIGIR conference on Research and development in information retrieval;International ACM SIGIR conference on Research and development in information retrieval . 2000

机译：部分集合复制与信息检索系统的缓存
5. Information retrieval with concept discovery in digital collections for agriculture and natural resources [D] . Ziemba, Lukasz 2011

机译：农业和自然资源数字馆藏中具有概念发现的信息检索
6. Recollection-Based Retrieval Is Influenced by Contextual Variation at Encoding but Not at Retrieval [O] . Eyal Rosenstreich, Yonatan Goshen-Gottstein -1

机译：基于回忆的检索在编码时受上下文变化的影响但在检索时不受上下文变化的影响
7. Partial Collection Replication versus Caching for Information Retrieval Systems [O] . Zhihong Lu, Kathryn S. McKinley 2000

机译：信息收集系统的部分集合复制与缓存

Partial Collection Replication for Information Retrieval

摘要

著录项

相似文献

相关主题

期刊订阅