首页> 外文期刊>Journal of Biomolecular Structure and Dynamics >Accurate prediction of protein structural classes using functional domains and predicted secondary structure sequences
【24h】

Accurate prediction of protein structural classes using functional domains and predicted secondary structure sequences

机译:使用功能域和预测的二级结构序列准确预测蛋白质结构类别

获取原文
获取原文并翻译 | 示例
           

摘要

Protein structural class prediction is one of the challenging problems in bioinformatics. Previous methods directly based on the similarity of amino acid (AA) sequences have been shown to be insufficient for low-similarity protein data-sets. To improve the prediction accuracy for such low-similarity proteins, different methods have been recently proposed that explore the novel feature sets based on predicted secondary structure propensities. In this paper, we focus on protein structural class prediction using combinations of the novel features including secondary structure propensities as well as functional domain (FD) features extracted from the InterPro signature database. Our comprehensive experimental results based on several benchmark data-sets have shown that the integration of new FD features substantially improves the accuracy of structural class prediction for low-similarity proteins as they capture meaningful relationships among AA residues that are far away in protein sequence. The proposed prediction method has also been tested to predict structural classes for partially disordered proteins with the reasonable prediction accuracy, which is a more difficult problem comparing to structural class prediction for commonly used benchmark data-sets and has never been done before to the best of our knowledge. In addition, to avoid overfitting with a large number of features, feature selection is applied to select discriminating features that contribute to achieve high prediction accuracy. The selected features have been shown to achieve stable prediction performance across different benchmark data-sets.
机译:蛋白质结构分类预测是生物信息学中具有挑战性的问题之一。直接基于氨基酸(AA)序列相似性的先前方法已被证明不足以用于低相似性蛋白质数据集。为了提高此类低相似蛋白的预测准确性,最近已提出了不同的方法,这些方法基于预测的二级结构倾向来探索新颖的特征集。在本文中,我们着重于蛋白质结构类别的预测,结合了包括从InterPro签名数据库中提取的二级结构倾向以及功能域(FD)功能在内的新颖特征。我们基于几个基准数据集的综合实验结果表明,新的FD功能的集成极大地提高了低相似性蛋白质结构分类预测的准确性,因为它们捕获了蛋白质序列中距离较远的AA残基之间的有意义的关系。所提出的预测方法也已经过测试,能够以合理的预测精度预测部分无序蛋白质的结构类别,与常用基准数据集的结构类别预测相比,这是一个更困难的问题,而且迄今为止从未做到过我们的知识。另外,为避免过多拟合大量特征,将特征选择应用于选择有助于实现高预测精度的区分特征。已显示所选功能可在不同基准数据集上实现稳定的预测性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号