【24h】

An Alphabet-Friendly FM-Index

机译:字母友好的FM索引

获取原文
获取原文并翻译 | 示例

摘要

We show that, by combining an existing compression boosting technique with the wavelet tree data structure, we are able to design a variant of the FM-index which scales well with the size of the input alphabet Σ. The size of the new index built on a string T[1, n] is bounded by nH_k(T)+O((n log log n)/ log_(|Σ|) n) bits, where H_k(T) is the k-th order empirical entropy of T. The above bound holds simultaneously for all k ≤ α log_(|Σ|) n and 0 < α < 1. Moreover, the index design does not depend on the parameter k, which plays a role only in analysis of the space occupancy. Using our index, the counting of the occurrences of an arbitrary pattern P[1,p] as a substring of T takes O(p log |Σ|) time. Locating each pattern occurrence takes O(log |Σ| (log~2 n/ log log n)) time. Reporting a text substring of length l takes O((l + log~2 n/ log log n) log |Σ|) time.
机译:我们表明,通过将现有的压缩增强技术与小波树数据结构相结合,我们能够设计出FM索引的变体,该变体可以随输入字母Σ的大小很好地缩放。建立在字符串T [1,n]上的新索引的大小由nH_k(T)+ O((n log log n)/ log_(|Σ|)n)位限制,其中H_k(T)是T的k阶经验熵。对于所有k≤αlog_(|Σ|)n和0 <α<1,上述界限同时成立。此外,索引设计不依赖于参数k,它起着作用仅在分析空间占用率时。使用我们的索引,对任意模式P [1,p]作为T的子串的出现进行计数需要O(p log |Σ|)时间。定位每个模式出现需要O(log |Σ|(log〜2 n / log log n))时间。报告长度为l的文本子字符串需要O((l + log〜2 n / log log n)log |Σ|)时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号