首页> 外文会议>International Conference on Bioinformatics and Computational Biology >CentroidBLAST: Accelerating Sequence Search via Clustering
【24h】

CentroidBLAST: Accelerating Sequence Search via Clustering

机译:Centroidblast:通过群集加速序列搜索

获取原文

摘要

BLAST, short for Basic Local Alignment Search Tool, searches for regions of local similarity between a query sequence and a large database of DNA or amino-acid sequences. It serves as a fundamental tool to many discovery processes in bioinformatics and computational biology, including inferring functional and evolutionary relationships between sequences, identifying members of gene families, and phylogenetic profiling. Consequently, researchers have spent many decades making local alignment search (such as BLAST) more efficient, both with respect to speed and accuracy. In this paper, we present our approach for more efficient sequence search, which we dub CentroidBLAST. CentroidBLAST first works on a representative fraction of the original database, where each representative serves as a "centroid" of similar sequences. A centroid's cluster of sequences is then searched only if its representative sequence is a similar match to the query sequence. This approach delivers as much as a 6.85-fold speed-up over NCBI BLAST. In addition, we analyze different aspects of CentroidBLAST, including execution time, biological significance of resulting alignments, selection of e-value cut-off, and effect of database compression.
机译:BLAST,用于基本局部对齐搜索工具的简短,搜索查询序列与DNA或氨基酸序列的大型数据库之间的局部相似区域。它用作生物信息学和计算生物学中许多发现过程的基本工具,包括推断序列之间的功能和进化关系,鉴定基因家族的成员和系统发育剖析。因此,研究人员已经花了多十年来使局部对准搜索(如爆炸)更有效,既涉及速度和准确性。在本文中,我们介绍了我们的方法,以获得更有效的序列搜索,我们将其设计为呈心电图。 Centroidblast首先在原始数据库的代表性分数上工作,其中每个代表用作类似序列的“质心”。然后,仅当其代表性序列与查询序列类似的匹配时,才会搜索质心的序列集群。这种方法可以在NCBI Blast上提供6.85倍的加速。此外,我们分析了质心的不同方面,包括执行时间,产生的对准的生物学意义,选择电子值截止的选择,以及数据库压缩的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号