...
首页> 外文期刊>Proceedings of the National Academy of Sciences of the United States of America >An Eulerian path approach to local multiple alignment for DNA sequences.
【24h】

An Eulerian path approach to local multiple alignment for DNA sequences.

机译:对DNA序列进行局部多重比对的欧拉路径方法。

获取原文
获取原文并翻译 | 示例
           

摘要

Expensive computation in handling a large number of sequences limits the application of local multiple sequence alignment. We present an Eulerian path approach to local multiple alignment for DNA sequences. The computational time and memory usage of this approach is approximately linear to the total size of sequences analyzed; hence, it can handle thousands of sequences or millions of letters simultaneously. By constructing a De Bruijn graph, most of the conserved segments are amplified as heavy Eulerian paths in the graph, and the original patterns distributed in sequences are recovered even if they do not exist in any single sequence. This approach can accurately detect unknown conserved regions, for both short and long, conserved and degenerate patterns. We further present a Poisson heuristic to estimate the significance of a local multiple alignment. The performance of our method is demonstrated by finding Alu repeats in the human genome. We compare the results with Alus marked by repeatmasker, where the two programs are in good agreement. Our method is robust under various conditions and superior to other methods in terms of efficiency and accuracy.
机译:处理大量序列时的昂贵计算限制了局部多序列比对的应用。我们提出了对DNA序列进行局部多重比对的欧拉路径方法。这种方法的计算时间和内存使用量与所分析序列的总大小大致呈线性关系。因此,它可以同时处理数千个序列或数百万个字母。通过构建De Bruijn图,大多数保守段被放大为图中的重欧拉路径,并且即使序列中不存在原始模式,也可以恢复序列中分布的原始模式。这种方法可以准确地检测未知的保守区域,包括短和长,保守和简并模式。我们进一步提出一种泊松试探法来估计局部多重比对的重要性。我们的方法的性能通过在人类基因组中发现Alu重复序列来证明。我们将结果与带有Repeatmasker标记的Alus进行比较,这两个程序非常吻合。我们的方法在各种条件下都非常可靠,并且在效率和准确性方面都优于其他方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号