An Alphabet-Friendly FM-Index

机译：字母友好的FM索引

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We show that, by combining an existing compression boosting technique with the wavelet tree data structure, we are able to design a variant of the FM-index which scales well with the size of the input alphabet Σ. The size of the new index built on a string T[1, n] is bounded by nH_k(T)+O((n log log n)/ log_(|Σ|) n) bits, where H_k(T) is the k-th order empirical entropy of T. The above bound holds simultaneously for all k ≤ α log_(|Σ|) n and 0 < α < 1. Moreover, the index design does not depend on the parameter k, which plays a role only in analysis of the space occupancy. Using our index, the counting of the occurrences of an arbitrary pattern P[1,p] as a substring of T takes O(p log |Σ|) time. Locating each pattern occurrence takes O(log |Σ| (log~2 n/ log log n)) time. Reporting a text substring of length l takes O((l + log~2 n/ log log n) log |Σ|) time.

机译：我们表明，通过将现有的压缩增强技术与小波树数据结构相结合，我们能够设计出FM索引的变体，该变体可以随输入字母Σ的大小很好地缩放。建立在字符串T [1，n]上的新索引的大小由nH_k（T）+ O（（n log log n）/ log_（|Σ|）n）位限制，其中H_k（T）是T的k阶经验熵。对于所有k≤αlog_（|Σ|）n和0 <α<1，上述界限同时成立。此外，索引设计不依赖于参数k，它起着作用仅在分析空间占用率时。使用我们的索引，对任意模式P [1，p]作为T的子串的出现进行计数需要O（p log |Σ|）时间。定位每个模式出现需要O（log |Σ|（log〜2 n / log log n））时间。报告长度为l的文本子字符串需要O（（l + log〜2 n / log log n）log |Σ|）时间。

著录项

来源
《International Conference on String Processing and Information Retrieval(SPIRE 2004); 20041005-08; Padova(IT)》|2004年|P.150-160|共11页
会议地点 Padova(IT)
作者
Paolo Ferragina; Giovanni Manzini; Veli Maekinen; Gonzalo Navarro;
展开▼
作者单位

Dipartimento di Informatica, University of Pisa, Italy;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类数据备份与恢复;
关键词

相似文献

外文文献
中文文献
专利

1. Secure Wavelet Matrix: Alphabet-Friendly Privacy-Preserving String Search for Bioinformatics [J] . Sudo Hiroki, Jimbo Masanobu, Nuida Koji, IEEE/ACM transactions on computational biology and bioinformatics . 2019,第5期

机译：安全小波矩阵：字母友好的隐私保护字符串搜索生物信息学
2. Simpler FM-index for parameterized string matching [J] . Kim Sung-Hwan, Cho Hwan-Gue Information Processing Letters . 2021,第Jana期

机译：用于参数化字符串匹配的更简单的FM-index
3. Enabling fast and energy-efficient FM-index exact matching using processing-near-memory [J] . Herruzo Jose M., Fernandez Ivan, Gonzalez-Navarro Sonia, Journal of supercomputing . 2021,第9期

机译：通过处理近存储器启用快速和节能的FM-Index精确匹配
4. An Alphabet-Friendly FM-Index [C] . Paolo Ferragina, Giovanni Manzini, Veli Maekinen, International Conference on String Processing and Information Retrieval . 2004

机译：字母友好的fm-index
5. Hardware Implementation of a String Matching Algorithm Based on the FM-Index [D] . Fernandez, Edward Bryann Cabanayan 2013

机译：基于FM-Index的字符串匹配算法的硬件实现
6. FMLRC: Hybrid long read error correction using an FM-index [O] . Jeremy R. Wang, James Holt, Leonard McMillan, 2018

机译：FMLRC：使用FM索引的混合式长读错误校正
7. An alphabet-friendly FM-index [O] . Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, 2004

机译：字母友好的FM索引

An Alphabet-Friendly FM-Index

摘要

著录项

相似文献

相关主题

期刊订阅