首页> 外文期刊>Journal of Molecular Biology >Progress of structural genomics initiatives: an analysis of solved target structures.
【24h】

Progress of structural genomics initiatives: an analysis of solved target structures.

机译:结构基因组计划的进展:已解决目标结构的分析。

获取原文
获取原文并翻译 | 示例
           

摘要

The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (>/=30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.
机译:基因序列数据的爆炸式增长以及蛋白质结构测定技术的突破,激发了结构基因组学(SG)计划的启动。结构基因组学的一个经常陈述的目标是所有蛋白质序列家族的高通量结构表征,长期希望对生命科学,生物技术和药物发现产生重大影响。在这里,我们将对已解决的SG目标进行全面分析,以评估这些计划的进度。 11个财团分别向蛋白质数据库(PDB)贡献了316条非冗余条目和323条蛋白质链,为CATH和SCOP结构分类分别贡献了459和393个域。这些蛋白质的质量和大小可与传统结构生物学中所解析的蛋白质相媲美,尽管有大量重复工作的余地,但只有14%的靶标具有由另一个财团解决的紧密同源(> / = 30%序列同一性)。对CATH和SCOP的分析表明,结构基因组学对超家族和褶皱的覆盖做出了重大贡献。 CATH中总共有67%的SG域是唯一的,在PDB中缺少已经表征的紧密同源物,而只有21%的非SG域是唯一的。对于29%的结构域,结构确定揭示了从序列上不明显的远距离进化关系,并且19%和11%贡献了新的超家族和折叠。该数据集的二级结构类别,折叠和超家族分布反映了基因组的那些。 CATH中的域分为172个不同的折叠和259个超家族,但分布高度偏斜。其中人口最多的是在基因组中最频繁出现的那些。尽管11%的超家族是细菌特异的,但大多数是生命的三个超级王国共有的,并且316个PDB条目共同为206个完全测序的基因组中的9287个非冗余基因序列提供了新的可靠的同源性模型。从这种分析的角度来看,似乎结构基因组学有望取得成功,并且希望这项工作能为该领域的未来发展指明方向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号