首页> 外文期刊>Journal of Bioinformatics and Computational Biology >A Comprehensive Whole Genome Bacterial Phylogeny Using Correlated Peptide Motifs Defined in a High Dimensional Vector Space
【24h】

A Comprehensive Whole Genome Bacterial Phylogeny Using Correlated Peptide Motifs Defined in a High Dimensional Vector Space

机译:使用在高维向量空间中定义的相关肽基序的完整的全基因组细菌系统发育

获取原文
获取原文并翻译 | 示例
           

摘要

As whole genome sequences continue to expand in number and complexity, effective methods for comparing and categorizing both genes and species represented within extremely large datasets are required. Methods introduced to date have generally utilized incomplete and likely insufficient subsets of the available data. We have developed an accurate and efficient method for producing robust gene and species phylogenies using very large whole genome protein datasets. This method relies on multidimensional protein vector definitions supplied by the singular value decomposition (SVD) of a large sparse data matrix in which each protein is uniquely represented as a vector of overlapping tetrapeptide frequencies. Quantitative pairwise estimates of species similarity were obtained by summing the protein vectors to form species vectors, then determining the cosines of the angles between species vectors. Evolutionary trees produced using this method confirmed many accepted prokaryotic relationships. However, several unconventional relationships were also noted. In addition, we demonstrate that many of the SVD-derived right basis vectors represent particular conserved protein families, while many of the corresponding left basis vectors describe conserved motifs within these families as sets of correlated peptides (copeps). This analysis represents the most detailed simultaneous comparison of prokaryotic genes and species available to date.
机译:随着整个基因组序列的数量和复杂性不断扩大,需要一种有效的方法来对非常大的数据集中表示的基因和物种进行比较和分类。迄今为止,引入的方法通常利用了可用数据的不完整和可能不足的子集。我们已经开发出使用非常大的全基因组蛋白质数据集来产生强大的基因和物种系统发育的准确有效的方法。此方法依赖于大型稀疏数据矩阵的奇异值分解(SVD)提供的多维蛋白质载体定义,其中每种蛋白质都唯一表示为重叠四肽频率的载体。通过将蛋白质载体相加以形成物种矢量,然后确定物种矢量之间的角度的余弦值,可获得物种相似性的成对定量估计。使用这种方法产生的进化树证实了许多公认的原核关系。但是,也注意到一些非常规的关系。此外,我们证明了许多SVD衍生的右基载体代表特定的保守蛋白家族,而许多相应的左基载体将这些家族中的保守基序描述为相关肽(copeps)的集合。该分析代表了迄今为止最详细的原核基因和物种同时比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号