...
【24h】

Fast matching statistics in small space

机译:小空间快速匹配统计

获取原文
           

摘要

Computing the matching statistics of a string S with respect to a string T on an alphabet of size sigma is a fundamental primitive for a number of large-scale string analysis applications, including the comparison of entire genomes, for which space is a pressing issue. This paper takes from theory to practice an existing algorithm that uses just O(|T|log{sigma}) bits of space, and that computes a compact encoding of the matching statistics array in O(|S|log{sigma}) time. The techniques used to speed up the algorithm are of general interest, since they optimize queries on the existence of a Weiner link from a node of the suffix tree, and parent operations after unsuccessful Weiner links. Thus, they can be applied to other matching statistics algorithms, as well as to any suffix tree traversal that relies on such calls. Some of our optimizations yield a matching statistics implementation that is up to three times faster than a plain version of the algorithm, depending on the similarity between S and T. In genomic datasets of practical significance we achieve speedups of up to 1.8, but our fastest implementations take on average twice the time of an existing code based on the LCP array. The key advantage is that our implementations need between one half and one fifth of the competitor's memory, and they approach comparable running times when S and T are very similar.
机译:对于许多大规模的字符串分析应用程序(包括整个基因组的比较)而言,计算大小S的字母上的字符串S相对于字符串T的匹配统计信息是一个基本原语,其中空间是一个紧迫的问题。本文从理论上实践了一种仅使用O(| T | log {sigma})空间的算法,并在O(| S | log {sigma})时间内计算匹配统计数组的紧凑编码。用来加速算法的技术引起了广泛关注,因为它们优化了对后缀树节点上的Weiner链接是否存在以及Weiner链接失败之后的父操作的查询。因此,它们可以应用于其他匹配统计算法,以及应用于依赖此类调用的任何后缀树遍历。根据S和T之间的相似性,我们的某些优化产生了匹配的统计实现,其实现速度比普通算法快三倍。在具有实际意义的基因组数据集中,我们实现了高达1.8的提速,但速度最快实现平均要花费基于LCP阵列的现有代码的两倍时间。关键优势在于,我们的实现需要竞争对手内存的一半到五分之一,并且当S和T非常相似时,它们的运行时间可比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号