首页> 外文期刊>Journal of Molecular Biology >Structural alphabets for protein structure classification: a comparison study.
【24h】

Structural alphabets for protein structure classification: a comparison study.

机译:蛋白质结构分类的结构字母:一项比较研究。

获取原文
获取原文并翻译 | 示例
           

摘要

Finding structural similarities between proteins often helps reveal shared functionality, which otherwise might not be detected by native sequence information alone. Such similarity is usually detected and quantified by protein structure alignment. Determining the optimal alignment between two protein structures, however, remains a hard problem. An alternative approach is to approximate each three-dimensional protein structure using a sequence of motifs derived from a structural alphabet. Using this approach, structure comparison is performed by comparing the corresponding motif sequences or structural sequences. In this article, we measure the performance of such alphabets in the context of the protein structure classification problem. We consider both local and global structural sequences. Each letter of a local structural sequence corresponds to the best matching fragment to the corresponding local segment of the protein structure. The global structural sequence is designed to generate the best possible complete chain that matches the full protein structure. We use an alphabet of 20 letters, corresponding to a library of 20 motifs or protein fragments having four residues. We show that the global structural sequences approximate well the native structures of proteins, with an average coordinate root mean square of 0.69 A over 2225 test proteins. The approximation is best for all alpha-proteins, while relatively poorer for all beta-proteins. We then test the performance of four different sequence representations of proteins (their native sequence, the sequence of their secondary-structure elements, and the local and global structural sequences based on our fragment library) with different classifiers in their ability to classify proteins that belong to five distinct folds of CATH. Without surprise, the primary sequence alone performs poorly as a structure classifier. We show that addition of either secondary-structure information or local information from the structural sequence considerably improvesthe classification accuracy. The two fragment-based sequences perform better than the secondary-structure sequence but not well enough at this stage to be a viable alternative to more computationally intensive methods based on protein structure alignment.
机译:寻找蛋白质之间的结构相似性通常有助于揭示共享的功能,否则仅靠天然序列信息可能无法检测到这些功能。通常通过蛋白质结构比对来检测和定量这种相似性。然而,确定两个蛋白质结构之间的最佳比对仍然是一个难题。另一种方法是使用衍生自结构字母的基序序列来近似每个三维蛋白质结构。使用这种方法,通过比较相应的基序序列或结构序列进行结构比较。在本文中,我们在蛋白质结构分类问题的背景下测量此类字母的性能。我们考虑局部和整体结构序列。局部结构序列的每个字母对应于蛋白质结构相应局部片段的最佳匹配片段。设计全局结构序列以产生与完整蛋白质结构匹配的最佳可能完整链。我们使用20个字母的字母,对应于20个基序或具有四个残基的蛋白质片段的库。我们显示全局结构序列很好地接近蛋白质的天然结构,在2225个测试蛋白质上的平均坐标均方根为0.69A。对于所有的α蛋白而言,最佳近似值最佳,而对于所有的β蛋白而言,近似值则相对较差。然后,我们使用不同的分类器来测试蛋白质的四种不同序列表示形式(它们的天然序列,其二级结构元素的序列以及基于我们片段库的局部和全局结构序列)的性能,以区分它们属于折叠成五个不同的CATH。毫不奇怪,仅主序列作为结构分类器就表现不佳。我们表明,添加二级结构信息或来自结构序列的局部信息可显着提高分类精度。这两个基于片段的序列比二级结构的序列性能更好,但在这一阶段还不够好,无法替代基于蛋白质结构比对的计算强度更高的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号