【24h】

New Algorithms for Finding Monad Patterns in DNA Sequences

机译:在DNA序列中寻找Monad模式的新算法

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we present two new algorithms for discovering monad patterns in DNA sequences. Monad patterns are of the form (l,d)-k, where l is the length of the pattern, d is the maximum number of mismatches allowed, and k is the minimum number of times the pattern is repeated in the given sample. The time-complexity of some of the best known algorithms to date is O(nt~2 l~d σ~d), where t is the number of input sequences, n is the length of each input sequence, and σ = |Σ| is the size of the alphabet. The first algorithm that we present in this paper takes O(n~2 t~2 l~(d/2)) time and O(ntl~(d/2) σ~(d/2)) space, and the second algorithm takes O(n~3 t~3 l~(d/2) σ~(d/2)) time using O(l~(d/2) σ~(d/2)) space. In practice, our algorithms have much better performance provided the d/l ratio is small. The second algorithm performs very well even for large values l and d as long as the d/l ratio is small.
机译:在本文中,我们提出了两种用于发现DNA序列中单峰模式的新算法。 Monad模式的形式为(l,d)-k,其中l是模式的长度,d是允许的最大不匹配数,k是在给定样本中重复模式的最小次数。迄今为止,一些最著名的算法的时间复杂度为O(nt〜2 l〜dσ〜d),其中t是输入序列的数量,n是每个输入序列的长度,并且σ= |Σ |是字母的大小。我们在本文中提出的第一个算法占用O(n〜2 t〜2 l〜(d / 2))时间和O(ntl〜(d / 2)σ〜(d / 2))空间,第二个算法占用该算法使用O(l〜(d / 2)σ〜(d / 2))空间花费O(n〜3 t〜3 l〜(d / 2)σ〜(d / 2))时间。实际上,只要d / l比很小,我们的算法就会有更好的性能。只要d / l之比很小,第二种算法即使对于较大的l和d也可以很好地执行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号