首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Retrieving Smith-Waterman Alignments with Optimizations for Megabase Biological Sequences Using GPU
【24h】

Retrieving Smith-Waterman Alignments with Optimizations for Megabase Biological Sequences Using GPU

机译:使用GPU检索Megabase生物序列优化的Smith-Waterman序列

获取原文
获取原文并翻译 | 示例
           

摘要

In Genome Projects, biological sequences are aligned thousands of times, in a daily basis. The Smith-Waterman algorithm is able to retrieve the optimal local alignment with quadratic time and space complexity. So far, aligning huge sequences, such as whole chromosomes, with the Smith-Waterman algorithm has been regarded as unfeasible, due to huge computing and memory requirements. However, high-performance computing platforms such as GPUs are making it possible to obtain the optimal result for huge sequences in reasonable time. In this paper, we propose and evaluate CUDAlign 2.1, a parallel algorithm that uses GPU to align huge sequences, executing the Smith-Waterman algorithm combined with Myers-Miller, with linear space complexity. In order to achieve that, we propose optimizations which are able to reduce significantly the amount of data processed, while enforcing full parallelism most of the time. Using the NVIDIA GTX 560 Ti board and comparing real DNA sequences that range from 162 KBP (Thousand Base Pairs) to 59 MBP (Million Base Pairs), we show that CUDAlign 2.1 is scalable. Also, we show that CUDAlign 2.1 is able to produce the optimal alignment between the chimpanzee chromosome 22 (33 MBP) and the human chromosome 21 (47 MBP) in 8.4 hours and the optimal alignment between the chimpanzee chromosome Y (24 MBP) and the human chromosome Y (59 MBP) in 13.1 hours.
机译:在基因组计划中,每天对生物序列进行数千次排列。 Smith-Waterman算法能够以二次时间和空间复杂度来检索最佳局部对齐。迄今为止,由于巨大的计算和内存需求,使用史密斯-沃特曼算法将巨大的序列(例如整个染色体)进行比对被认为是不可行的。但是,诸如GPU之类的高性能计算平台使在合理的时间内获得巨大序列的最佳结果成为可能。在本文中,我们提出并评估了CUDAlign 2.1,这是一种并行算法,它使用GPU来对齐巨大的序列,并执行Smith-Waterman算法和Myers-Miller组合,具有线性空间复杂度。为了实现这一目标,我们提出了一些优化措施,这些优化措施可以大大减少处理的数据量,同时大多数时间都强制执行完全并行性。使用NVIDIA GTX 560 Ti板并比较了从162 KBP(千碱基对)到59 MBP(百万碱基对)的真实DNA序列,我们证明CUDAlign 2.1具有可扩展性。此外,我们显示CUDAlign 2.1能够在8.4小时内在黑猩猩22号染色体(33 MBP)和人类21号染色体(47 MB​​P)之间产生最佳比对,并且在黑猩猩Y染色体(24 MBP)与黑猩猩X染色体之间产生最佳比对。 13.1小时内获得人类Y染色体(59 MBP)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号