...
首页> 外文期刊>Information retrieval >Compressing Inverted Files
【24h】

Compressing Inverted Files

机译:压缩反转文件

获取原文
获取原文并翻译 | 示例
           

摘要

Research into inverted file compression has focused on compression ratio―how small the indexes can be. Compression ratio is important for fast interactive searching. It is taken as read, the smaller the index, the faster the search. The premise "smaller is better" may not be true. To truly build faster indexes it is often necessary to forfeit compression. For inverted lists consisting of only 128 occurrences compression may only add overhead. Perhaps the inverted list could be stored in 128 bytes in place of 128 words, but it must still be stored on disk. If the minimum disk sector read size is 512 bytes and the word size is 4 bytes, then both the compressed and raw postings would require one disk seek and one disk sector read. A less efficient compression technique may increase the file size, but decrease load/decompress time, thereby increasing throughput. Examined here are five compression techniques, Golomb, Elias gamma, Elias delta, Variable Byte Encoding and Binary Interpolative Coding. The effect on file size, file seek time, and file read time are all measured as is decompression time. A quantitative measure of throughput is developed and the performance of each method is determined.
机译:反向文件压缩的​​研究集中于压缩率,即索引的大小。压缩比对于快速交互式搜索很重要。它被视为已读,索引越小,搜索速度越快。 “越小越好”的前提可能不正确。为了真正建立更快的索引,通常有必要放弃压缩。对于仅包含128个事件的倒排列表,压缩可能只会增加开销。倒排的列表也许可以以128个字节存储,而不是128个字,但仍必须存储在磁盘上。如果最小磁盘扇区读取大小为512字节,字大小为4字节,则压缩和原始过帐都将需要一个磁盘搜索和一个磁盘扇区读取。效率较低的压缩技术可能会增加文件大小,但会减少加载/解压缩时间,从而增加吞吐量。这里检查了五种压缩技术,Golomb,Elias gamma,Elias delta,可变字节编码和二进制插值编码。对文件大小,文件查找时间和文件读取时间的影响均按解压缩时间来衡量。开发了定量测量通量并确定每种方法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号