首页> 外文期刊>Information Processing & Management >Comparing inverted files and signature files for searching a large lexicon
【24h】

Comparing inverted files and signature files for searching a large lexicon

机译:比较反向文件和签名文件以搜索大型词典

获取原文
获取原文并翻译 | 示例
           

摘要

Signature files and inverted files are well-known index structures. In this paper we undertake a direct comparison of the two for searching for partially-specified queries in a large lexicon stored in main memory. Using n-grams to index lexicon terms, a bit-sliced signature file can be compressed to a smaller size than an inverted file if each n-gram sets only 14 one bit in the term signature. With a signature width less than half the number of unique n-grams in the lexicon, the signature file method is about as fast as the inverted file method, and significantly smaller. Greater flexibility in memory usage and faster index generation time make signature files appropriate for searching large lexicons or other collections in an environment where memory is at a premium. (C) 2004 Elsevier Ltd. All rights reserved.
机译:签名文件和反向文件是众所周知的索引结构。在本文中,我们对两者进行了直接比较,以在存储在主存储器中的大型词典中搜索部分指定的查询。如果每个n-gram仅在术语签名中设置14位,则使用n-gram对词典术语进行索引时,可以将位片签名文件压缩为比反向文件小的大小。由于签名宽度小于词典中唯一n-gram数量的一半,因此签名文件方法的速度与倒排文件方法一样快,并且要小得多。内存使用方面的更大灵活性和更快的索引生成时间使签名文件适合在内存非常宝贵的环境中搜索大型词典或其他集合。 (C)2004 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号