...
【24h】

Growth of novel protein structural data

机译:新型蛋白质结构数据的增长

获取原文
获取原文并翻译 | 示例
           

摘要

Contrary to popular assumption, the rate of growth of structural data has slowed, and the Protein Data Bank (PDB) has not been growing exponentially since 1995. Reaching such a dramatic conclusion requires careful measurement of growth of novel structures, which can be achieved by clustering entry sequences, or by using a novel index to down-weight entries with a higher number of sequence neighbors. These measures agree, and growth rates are very similar for entire PDB files, clusters, and weighted chains. The overall sizes of Structural Classification of Proteins (SCOP) categories (number of families, superfamilies, and folds) appear to be directly proportional to the number of deposited PDB files. Using our weighted chain count, which is most correlated to the change in the size of each SCOP category in any time period, shows that the rate of increase of SCOP categories is actually slowing down. This enables the final size of each of these SCOP categories to be predicted without examining or comparing protein structures. In the last 3 years, structures solved by structural genomics (SG) initiatives, especially the United States National Institutes of Health Protein Structure Initiative, have begun to redress the slowing growth of the PDB. Structures solved by SG are 3.8 times less sequence-redundant than typical PDB structures. Since mid-2004, SG programs have contributed half the novel structures measured by weighted chain counts. Our analysis does not rely on visual inspection of coordinate sets: it is done automatically, providing an accurate, up-to-date measure of the growth of novel protein structural data.
机译:与普遍的假设相反,结构数据的增长速度已经放慢,并且自1995年以来蛋白质数据库(PDB)并没有呈指数级增长。要取得如此显着的结论,就需要仔细衡量新型结构的增长,这可以通过以下方法实现:对条目序列进行聚类,或者使用新颖的索引来降低具有较高数量的序列邻居的条目的权重。这些措施是一致的,并且整个PDB文件,群集和加权链的增长率非常相似。蛋白质的结构分类(SCOP)类别(家族,超家族和折叠的数量)的总体大小似乎与存放的PDB文件的数量成正比。使用我们的加权链数(在任何时间段内与每个SCOP类别的大小变化最相关),表明SCOP类别的增长率实际上正在放缓。这使得可以在不检查或比较蛋白质结构的情况下预测这些SCOP类别中每个类别的最终大小。在过去的三年中,通过结构基因组学(SG)计划解决的结构,尤其是美国国立卫生研究院蛋白质结构计划,已经开始解决PDB增长缓慢的问题。 SG解析的结构比典型PDB结构的序列冗余少3.8倍。自2004年年中以来,SG计划贡献了按加权链数衡量的新颖结构的一半。我们的分析不依赖于对坐标集的目视检查:它是自动完成的,可为新型蛋白质结构数据的增长提供准确,最新的度量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号