首页> 外文期刊>Journal of Bioinformatics and Computational Biology >SHORT PROKARYOTIC DNA FRAGMENT BINNING USING A HIERARCHICAL CLASSIFIER BASED ON LINEAR DISCRIMINANT ANALYSIS AND PRINCIPAL COMPONENT ANALYSIS
【24h】

SHORT PROKARYOTIC DNA FRAGMENT BINNING USING A HIERARCHICAL CLASSIFIER BASED ON LINEAR DISCRIMINANT ANALYSIS AND PRINCIPAL COMPONENT ANALYSIS

机译:基于线性判别分析和主成分分析的层次分类器的短原核DNA片段结合

获取原文
获取原文并翻译 | 示例
           

摘要

Metagenomics is an emerging field in which the power of genomic analysis is applied to an entire microbial community, bypassing the need to isolate and culture individual microbial species. Assembling of metagenomic DNA fragments is very much like the overlap-layout-consensus procedure for assembling isolated genomes, but is augmented by an additional binning step to differentiate scaffolds, contigs and unassembled reads into various taxonomic groups. In this paper, we employed n-mer oligonucleotide frequencies as the features and developed a hierarchical classifier (PCAHIER) for binning short (≤ 1,000 bps) metagenomic fragments. The principal component analysis was used to reduce the high dimensionality of the feature space. The hierarchical classifier consists of four layers of local classifiers that are implemented based on the linear discriminant analysis. These local classifiers are responsible for binning prokaryotic DNA fragments into superkingdoms, of the same superkingdom into phyla, of the same phylum into genera, and of the same genus into species, respectively. We evaluated the performance of the PCAHIER by using our own simulated data sets as well as the widely used simHC synthetic metagenome data set from the IMG/M system. The effectiveness of the PCAHIER was demonstrated through comparisons against a non-hierarchical classifier, and two existing binning algorithms (TETRA and Phylopythia).
机译:元基因组学是一个新兴的领域,其中基因组分析的功能被应用于整个微生物群落,而无需分离和培养单个微生物物种。宏基因组DNA片段的组装非常类似于组装分离的基因组的重叠-布局-共识程序,但通过额外的装箱步骤将支架,重叠群和未组装的读段区分为各种分类组而得到了增强。在本文中,我们以n-mer寡核苷酸频率为特征,并开发了分级分类器(PCAHIER),用于分装短(≤1,000 bps)的宏基因组片段。主成分分析用于减少特征空间的高维。分层分类器由基于线性判别分析实现的四层局部分类器组成。这些局部分类器分别负责将原核DNA片段分为超级王国,相同超级王国分为门,相同门属和属。我们通过使用我们自己的模拟数据集以及IMG / M系统广泛使用的simHC合成元基因组数据集来评估PCAHIER的性能。通过与非分层分类器以及两个现有的装箱算法(TETRA和Phylopythia)进行比较,证明了PCAHIER的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号