Parallel Methods for Finding kk-Mismatch Shortest Unique Substrings Using GPU

Schultz Daniel W.; Xu Bojian

首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Parallel Methods for Finding kk-Mismatch Shortest Unique Substrings Using GPU

【24h】

Parallel Methods for Finding kk-Mismatch Shortest Unique Substrings Using GPU

机译：使用GPU查找KK-Mismatch最短独特子串的并行方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

k-mismatch shortest unique substring (SUS) queries have been proposed and studied very recently due to its useful applications in the subfield of computational biology. The k-mismatch SUS query over one given position of a string asks for a shortest substring that covers the given position and does not have a duplicate (within a Hamming distance of k) elsewhere in the string. The challenge in SUS query is to collectively find the SUS for every position of a massively long string in a both time- and space-efficient manner. All known efforts and results have been focused on improving and optimizing the time and space efficiency of SUS computation in the sequential CPU model. In this work, we propose the first parallel approach for k-mismatch SUS queries, particularly leveraging on the massive multi-threading architecture of the graphic processing unit (GPU) technology. Experimental study performed on a mid-end GPU using real-world biological data shows that our proposal is consistently faster than the fastest CPU solution by a factor of at least 6 for exact SUS queries (k = 0) and at least 23 for approximate SUS queries over DNA sequences (k > 0), while maintaining nearly the same peak memory usage as the most memory-efficient sequential CPU proposal. Our work provides practitioners a faster tool for SUS finding on massively long strings, and indeed provides the first practical tool for approximate SUS computation, because the any-case quadratical time cost of the state-of-the-art sequential CPU method for approximate SUS queries does not scale well even to modestly long strings.

机译：由于其在计算生物学的子领域中的有用应用，最近已经提出并研究了K-Mismatch最短独特的基板（SUS）查询。 k-mismatch sus查询字符串的一个给定位置询问覆盖给定位置的最短子字符串，并且在字符串中的其他位置没有副本（k的汉明距离）。 SUS查询中的挑战是以时间和空间高效的方式集体找到大量长串的每个位置。所有已知的努力和结果都集中在改进和优化顺序CPU模型中SUS计算的时间和空间效率。在这项工作中，我们提出了k-miscatch Sus查询的第一种并行方法，特别是在图形处理单元（GPU）技术的大规模多线程架构上。使用现实世界生物数据对中端GPU进行的实验研究表明，我们的提案比最快的CPU解决方案始终如一，对于精确的SUS查询（K = 0），至少23用于近似SUS对DNA序列（K> 0）进行查询，同时保持与最内存高效的顺序CPU提案的几乎相同的峰值存储器使用。我们的工作为从业者提供了一个更快的速度在大量长串上的工具，并且确实提供了用于近似SUS计算的第一个实用工具，因为最先进的顺序CPU方法用于近似SUS的任何情况下的次数成本Queries甚至没有划衡到适度的长串。

著录项

来源
《IEEE/ACM transactions on computational biology and bioinformatics》 |2021年第1期|386-395|共10页
作者
Schultz Daniel W.; Xu Bojian;
展开▼
作者单位

Univ Tennessee Dept Elect Engn & Comp Sci Knoxville TN 37996 USA;

Eastern Washington Univ Dept Comp Sci Cheney WA 99004 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Graphics processing units; Organisms; Proposals; Hamming distance; Central Processing Unit; Tools; String; pattern matching; shortest unique substring; parallel computing; GPU; CUDA;

机译：图形处理单位;生物;提案;汉明距离;中央处理单位;工具;弦;模式匹配;最短独特的子字符串;并行计算;GPU;GPU;CUDA;CUDA;

相似文献

外文文献
中文文献
专利

1. An Ultra-Fast and Parallelizable Algorithm for Finding kk-Mismatch Shortest Unique Substrings [J] . Allen Daniel R., Thankachan Sharma V, Xu Bojian IEEE/ACM transactions on computational biology and bioinformatics . 2021,第1期

机译：用于查找KK-Mismatch最短独特子串的超快速和并行算法
2. Space-time trade-offs for finding shortest unique substrings and maximal unique matches [J] . Ganguly Arnab, Hon Wing-Kai, Shah Rahul, Theoretical computer science . 2017,第期

机译：寻找最短独特子串的时空折衷和最大独特的匹配
3. A task parallel algorithm for finding all-pairs shortest paths using the GPU [J] . Tomohiro Okuyama, Fumihiko Ino, Kenichi Hagihara International Journal of High Performance Computing and Networking . 2012,第2期

机译：使用GPU查找全对最短路径的任务并行算法
4. On k-Mismatch Shortest Unique Substring Queries Using GPU [C] . Daniel W. Schultz, Bojian Xu International symposium on bioinformatics research and applications . 2018

机译：使用GPU在k不匹配最短唯一子串查询上
5. Parallelization of Genetic Algorithm to Solve MAX-3SAT Problem on GPUs [D] . Shivram, Prakruthi. 2019

机译：遗传算法解决GPU上最大3SAT问题的遗传算法
6. Genome comparison without alignment using shortest unique substrings [O] . Bernhard Haubold, Nora Pierstorff, Friedrich Möller, 2005

机译：基因组比较无需使用最短的唯一子字符串进行比对
7. More Time-Space Tradeoffs for Finding a Shortest Unique Substring [O] . Hideo Bannai, Travis Gagie, Gary Hoppenworth, 2020

机译：更多的时间空间权衡来查找最短的独特子字符串

Parallel Methods for Finding kk-Mismatch Shortest Unique Substrings Using GPU

摘要

著录项

相似文献

相关主题

期刊订阅