首页> 外文期刊>Applied Microbiology >Distribution-Based Clustering: Using Ecology To Refine the Operational Taxonomic Unit
【24h】

Distribution-Based Clustering: Using Ecology To Refine the Operational Taxonomic Unit

机译:基于分布的聚类:使用生态学来完善操作分类单位

获取原文
           

摘要

16S rRNA sequencing, commonly used to survey microbial communities, begins by grouping individual reads into operational taxonomic units (OTUs). There are two major challenges in calling OTUs: identifying bacterial population boundaries and differentiating true diversity from sequencing errors. Current approaches to identifying taxonomic groups or eliminating sequencing errors rely on sequence data alone, but both of these activities could be informed by the distribution of sequences across samples. Here, we show that using the distribution of sequences across samples can help identify population boundaries even in noisy sequence data. The logic underlying our approach is that bacteria in different populations will often be highly correlated in their abundance across different samples. Conversely, 16S rRNA sequences derived from the same population, whether slightly different copies in the same organism, variation of the 16S rRNA gene within a population, or sequences generated randomly in error, will have the same underlying distribution across sampled environments. We present a simple OTU-calling algorithm (distribution-based clustering) that uses both genetic distance and the distribution of sequences across samples and demonstrate that it is more accurate than other methods at grouping reads into OTUs in a mock community. Distribution-based clustering also performs well on environmental samples: it is sensitive enough to differentiate between OTUs that differ by a single base pair yet predicts fewer overall OTUs than most other methods. The program can decrease the total number of OTUs with redundant information and improve the power of many downstream analyses to describe biologically relevant trends.
机译:16S rRNA测序通常用于调查微生物群落,首先将单个读数分组为可操作的分类单位(OTU)。调用OTU面临两个主要挑战:确定细菌种群边界以及将真实多样性与测序错误区分开。当前识别分类学组或消除测序错误的方法仅依赖于序列数据,但是这两种活动都可以通过样品中序列的分布来了解。在这里,我们表明,即使在嘈杂的序列数据中,使用样本之间的序列分布也可以帮助识别群体边界。我们的方法所依据的逻辑是,不同种群中细菌之间的丰度在不同样本中通常会高度相关。相反,源自同一种群的16S rRNA序列,无论同一生物中的拷贝略有不同,种群内16S rRNA基因的变异还是错误随机产生的序列,在整个采样环境中的分布都相同。我们提出了一种简单的OTU调用算法(基于分布的聚类),该算法同时使用了遗传距离和样本之间的序列分布,并证明了在模拟社区中将读取分组为OTU时,它比其他方法更准确。基于分布的聚类在环境样本上也表现良好:它足够敏感,可以区分仅一个碱基对存在差异的OTU,但预测的总体OTU却比大多数其他方法少。该程序可以减少带有冗余信息的OTU总数,并提高许多下游分析来描述生物学相关趋势的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号