首页> 外文期刊>Computational Biology and Bioinformatics, IEEE/ACM Transactions on >Efficient and Accurate OTU Clustering with GPU-Based Sequence Alignment and Dynamic Dendrogram Cutting
【24h】

Efficient and Accurate OTU Clustering with GPU-Based Sequence Alignment and Dynamic Dendrogram Cutting

机译:基于GPU的序列比对和动态树状图切割的高效,准确的OTU聚类

获取原文
获取原文并翻译 | 示例
           

摘要

De novo clustering is a popular technique to perform taxonomic profiling of a microbial community by grouping 16S rRNA amplicon reads into operational taxonomic units (OTUs). In this work, we introduce a new dendrogram-based OTU clustering pipeline called CRiSPy. The key idea used in CRiSPy to improve clustering accuracy is the application of an anomaly detection technique to obtain a dynamic distance cutoff instead of using the de facto value of 97 percent sequence similarity as in most existing OTU clustering pipelines. This technique works by detecting an abrupt change in the merging heights of a dendrogram. To produce the output dendrograms, CRiSPy employs the OTU hierarchical clustering approach that is computed on a genetic distance matrix derived from an all-against-all read comparison by pairwise sequence alignment. However, most existing dendrogram-based tools have difficulty processing datasets larger than 10,000 unique reads due to high computational complexity. We address this difficulty by developing two efficient algorithms for CRiSPy: a compute-efficient GPU-accelerated parallel algorithm for pairwise distance matrix computation and a memory-efficient hierarchical clustering algorithm. Our experiments on various datasets with distinct attributes show that CRiSPy is able to produce more accurate OTU groupings than most OTU clustering applications.
机译:从头聚类是一种流行的技术,可通过将16S rRNA扩增子读段分组为操作分类单位(OTU)来对微生物群落进行分类分析。在这项工作中,我们介绍了一种新的基于树状图的OTU集群管道,称为CRiSPy。 CRiSPy中用于提高聚类准确性的关键思想是应用异常检测技术来获得动态距离截止,而不是像大多数现有OTU聚类管道那样使用97%序列相似性的实际值。该技术通过检测树状图合并高度的突然变化而起作用。为了生成输出树状图,CRiSPy采用了OTU分层聚类方法,该方法是通过基于成对序列比对的全反读比较从遗传距离矩阵计算得出的。然而,由于计算复杂度高,大多数现有的基于树状图的工具难以处理大于10,000个唯一读取的数据集。我们通过开发两种有效的CRiSPy算法来解决此难题:用于成对距离矩阵计算的高效计算GPU加速并行算法和高效的分层聚类算法。我们对具有不同属性的各种数据集进行的实验表明,与大多数OTU群集应用程序相比,CRiSPy能够产生更准确的OTU分组。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号