Distributed search based on self-indexed compressed text

Diego Arroyuelo; Veronica Gil-Costa; Senen Gonzalez; Mauricio Marin; Mauricio Oyarzun

首页> 外文期刊>Information Processing & Management >Distributed search based on self-indexed compressed text

【24h】

Distributed search based on self-indexed compressed text

机译：基于自索引压缩文本的分布式搜索

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Query response times within a fraction of a second in Web search engines are feasible due to the use of indexing and caching techniques, which are devised for large text collections partitioned and replicated into a set of distributed-memory processors. This paper proposes an alternative query processing method for this setting, which is based on a combination of self-indexed compressed text and posting lists caching. We show that a text self-index (i.e., an index that compresses the text and is able to extract arbitrary parts of it) can be competitive with an inverted index if we consider the whole query process, which includes index decompression, ranking and snippet extraction time. The advantage is that within the space of the compressed document collection, one can carry out the posting lists generation, document ranking and snippet extraction. This significantly reduces the total number of processors involved in the solution of queries. Alternatively, for the same amount of hardware, the performance of the proposed strategy is better than that of the classical approach based on treating inverted indexes and corresponding documents as two separate entities in terms of processors and memory space.

机译：由于使用了索引和缓存技术，因此在Web搜索引擎中不到一秒钟的查询响应时间是可行的，该技术是为分区和复制到一组分布式内存处理器中的大型文本集合设计的。本文提出了一种针对此设置的替代查询处理方法，该方法基于自索引压缩文本和发布列表缓存的组合。我们表明，如果考虑整个查询过程（包括索引解压缩，排名和代码段），则文本自索引（即压缩文本并能够提取文本的任意部分的索引）可以与倒排索引竞争。提取时间。优点是，在压缩文档集合的空间内，可以执行过帐列表生成，文档排名和摘录。这大大减少了查询解决方案中涉及的处理器总数。或者，对于相同数量的硬件，在处理器和内存空间方面，所提出策略的性能优于经典方法的性能，该经典方法基于将倒排索引和相应文档视为两个单独的实体。

著录项

来源
《Information Processing & Management》 |2012年第5期|p.819-827|共9页
作者
Diego Arroyuelo; Veronica Gil-Costa; Senen Gonzalez; Mauricio Marin; Mauricio Oyarzun;
展开▼
作者单位

Yahool Research Latin America, Santiago, Chile;

Yahool Research Latin America, Santiago, Chile,CONICET, National University of San Luis, Argentina;

Yahool Research Latin America, Santiago, Chile;

Yahool Research Latin America, Santiago, Chile,Department of Informatics Engineering, University of Santiago of Chile, Chile;

Department of Informatics Engineering, University of Santiago of Chile, Chile;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
web search engines; wavelet trees; snippet extraction; self-indexed compressed text; query processing;

机译：网络搜索引擎;小波树摘要提取;自索引压缩文本;查询处理;

相似文献

外文文献
中文文献
专利

1. Multi-Stream Word-Based Compression Algorithm for Compressed Text Search [J] . Ozturk Emir, Mesut Altan, Diri Banu Arabian Journal for Science and Engineering . 2018,第12期

机译：基于多流词的压缩文本搜索算法
2. Augmenting Medical Decision Making With Text-Based Search of Teaching File Repositories and Medical Ontologies: Text-Based Search of Radiology Teaching Files [J] . Priya Deshpande, Alexander Rasin, Eli T Brown, International journal of knowledge discovery in bioinformatics . 2018,第2期

机译：通过基于文本的教学文件存储库和医学本体搜索增强医疗决策：基于文本的放射学教学文件搜索
3. Partial index replicated and distributed scheme for full-text search on wireless broadcast [J] . Goel Vikas, Ahlawat Anil Kumar, Gupta M. N. Sadhana: Academy Proceedings in Engineering Science . 2015,第7期

机译：用于无线广播全文搜索的部分索引复制和分布式方案
4. A cooperative distributed text database management method unifying search and compression based on the burrows-wheeler transformation [C] . Kunihiko Sadakane, Hiroshi Imai International conference on conceptual modeling . 1999

机译：基于挖掘机轮转转换的合作分布式文本数据库管理方法统一搜索和压缩
5. Transform based and search aware text compression schemes and compressed domain text retrieval. [D] . Zhang, Nan. 2005

机译：基于转换和感知搜索的文本压缩方案以及压缩域文本检索。
6. Distributed Compressed Hyperspectral Sensing Imaging Based on Spectral Unmixing [O] . Zhongliang Wang, Hua Xiao 2020

机译：基于光谱分解的分布式压缩高光谱传感成像
7. Compressing Distributed Text in Parallel with (s, c)-Dense Codes [O] . Carolina Bonacic, Antonio Farina, Mauricio Marín, 2007

机译：与（s，c） - 密码并行压缩分布式文本

Distributed search based on self-indexed compressed text

摘要

著录项

相似文献

相关主题

期刊订阅