...
首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Parallel Methods for Finding kk-Mismatch Shortest Unique Substrings Using GPU
【24h】

Parallel Methods for Finding kk-Mismatch Shortest Unique Substrings Using GPU

机译:使用GPU查找KK-Mismatch最短独特子串的并行方法

获取原文
获取原文并翻译 | 示例
           

摘要

k-mismatch shortest unique substring (SUS) queries have been proposed and studied very recently due to its useful applications in the subfield of computational biology. The k-mismatch SUS query over one given position of a string asks for a shortest substring that covers the given position and does not have a duplicate (within a Hamming distance of k) elsewhere in the string. The challenge in SUS query is to collectively find the SUS for every position of a massively long string in a both time- and space-efficient manner. All known efforts and results have been focused on improving and optimizing the time and space efficiency of SUS computation in the sequential CPU model. In this work, we propose the first parallel approach for k-mismatch SUS queries, particularly leveraging on the massive multi-threading architecture of the graphic processing unit (GPU) technology. Experimental study performed on a mid-end GPU using real-world biological data shows that our proposal is consistently faster than the fastest CPU solution by a factor of at least 6 for exact SUS queries (k = 0) and at least 23 for approximate SUS queries over DNA sequences (k > 0), while maintaining nearly the same peak memory usage as the most memory-efficient sequential CPU proposal. Our work provides practitioners a faster tool for SUS finding on massively long strings, and indeed provides the first practical tool for approximate SUS computation, because the any-case quadratical time cost of the state-of-the-art sequential CPU method for approximate SUS queries does not scale well even to modestly long strings.
机译:由于其在计算生物学的子领域中的有用应用,最近已经提出并研究了K-Mismatch最短独特的基板(SUS)查询。 k-mismatch sus查询字符串的一个给定位置询问覆盖给定位置的最短子字符串,并且在字符串中的其他位置没有副本(k的汉明距离)。 SUS查询中的挑战是以时间和空间高效的方式集体找到大量长串的每个位置。所有已知的努力和结果都集中在改进和优化顺序CPU模型中SUS计算的时间和空间效率。在这项工作中,我们提出了k-miscatch Sus查询的第一种并行方法,特别是在图形处理单元(GPU)技术的大规模多线程架构上。使用现实世界生物数据对中端GPU进行的实验研究表明,我们的提案比最快的CPU解决方案始终如一,对于精确的SUS查询(K = 0),至少23用于近似SUS对DNA序列(K> 0)进行查询,同时保持与最内存高效的顺序CPU提案的几乎相同的峰值存储器使用。我们的工作为从业者提供了一个更快的速度在大量长串上的工具,并且确实提供了用于近似SUS计算的第一个实用工具,因为最先进的顺序CPU方法用于近似SUS的任何情况下的次数成本Queries甚至没有划衡到适度的长串。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号