首页> 美国卫生研究院文献>Bioinformation >A graph-based clustering method applied to protein sequences
【2h】

A graph-based clustering method applied to protein sequences

机译:基于图的聚类方法应用于蛋白质序列

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The number of amino acid sequences is increasing very rapidly in the protein databases like Swiss-Prot, Uniprot, PIR and others, but the structure of only some amino acid sequences are found in the Protein Data Bank. Thus, an important problem in genomics is automatically clustering homologous protein sequences when only sequence information is available. Here, we use graph theoretic techniques for clustering amino acid sequences. A similarity graph is defined and clusters in that graph correspond to connected subgraphs. Cluster analysis seeks grouping of amino acid sequences into subsets based on distance or similarity score between pairs of sequences. Our goal is to find disjoint subsets, called clusters, such that two criteria are satisfied: homogeneity: sequences in the same cluster are highly similar to each other; and separation: sequences in different clusters have low similarity to each other. We tested our method on several subsets of SCOP (Structural Classification of proteins) database, a gold standard for protein structure classification. The results show that for a given set of proteins the number of clusters we obtained is close to the superfamilies in that set; there are fewer singeltons; and the method correctly groups most remote homologs.
机译:在蛋白质数据库(如Swiss-Prot,Uniprot,PIR等)中,氨基酸序列的数量正在迅速增加,但在蛋白质数据库中仅发现了一些氨基酸序列的结构。因此,基因组学中的一个重要问题是当仅序列信息可用时,会自动将同源蛋白序列聚类。在这里,我们使用图论技术对氨基酸序列进行聚类。定义了一个相似度图,并且该图中的聚类对应于连接的子图。聚类分析基于序列对之间的距离或相似性得分,将氨基酸序列分组为子集。我们的目标是找到不相交的子集,称为簇,从而满足两个条件:同质:同一簇中的序列彼此高度相似;分离:不同聚类中的序列彼此之间的相似性较低。我们在蛋白质结构分类的黄金标准SCOP(蛋白质的结构分类)数据库的几个子集上测试了我们的方法。结果表明,对于给定的蛋白质组,我们获得的簇数接近于该组中的超家族;辛格尔顿少了;并且该方法可以正确地对大多数远程同源物进行分组。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号