首页> 外文期刊>Computational Biology and Bioinformatics, IEEE/ACM Transactions on >A Segmentation-Based Method to Extract Structural and Evolutionary Features for Protein Fold Recognition
【24h】

A Segmentation-Based Method to Extract Structural and Evolutionary Features for Protein Fold Recognition

机译:基于分割的提取蛋白质折叠识别结构和进化特征的方法

获取原文
获取原文并翻译 | 示例
           

摘要

Protein fold recognition (PFR) is considered as an important step towards the protein structure prediction problem. Despite all the efforts that have been made so far, finding an accurate and fast computational approach to solve the PFR still remains a challenging problem for bioinformatics and computational biology. In this study, we propose the concept of segmented-based feature extraction technique to provide local evolutionary information embedded in position specific scoring matrix (PSSM) and structural information embedded in the predicted secondary structure of proteins using SPINE-X. We also employ the concept of occurrence feature to extract global discriminatory information from PSSM and SPINE-X. By applying a support vector machine (SVM) to our extracted features, we enhance the protein fold prediction accuracy for 7.4 percent over the best results reported in the literature. We also report 73.8 percent prediction accuracy for a data set consisting of proteins with less than 25 percent sequence similarity rates and 80.7 percent prediction accuracy for a data set with proteins belonging to 110 folds with less than 40 percent sequence similarity rates. We also investigate the relation between the number of folds and the number of features being used and show that the number of features should be increased to get better protein fold prediction results when the number of folds is relatively large.
机译:蛋白质折叠识别(PFR)被认为是迈向蛋白质结构预测问题的重要一步。尽管到目前为止已经做出了所有努力,但是对于生物信息学和计算生物学而言,找到一种准确,快速的计算方法来解决PFR仍然是一个充满挑战的问题。在这项研究中,我们提出了基于分段的特征提取技术的概念,以使用SPINE-X提供嵌入特定位置评分矩阵(PSSM)中的局部进化信息和嵌入蛋白质的预测二级结构中的结构信息。我们还采用出现特征的概念从PSSM和SPINE-X中提取全局歧视性信息。通过对我们提取的特征应用支持向量机(SVM),我们将蛋白质折叠预测的准确性提高了7.4%,超过了文献报道的最佳结果。我们还报告了由小于25%的序列相似率的蛋白质组成的数据集的73.8%的预测准确度,以及对蛋白质属于110倍且序列相似率小于40%的蛋白质的数据集的80.7%的预测准确度。我们还研究了折叠数与所用特征数之间的关系,并表明当折叠数相对较大时,应增加特征数以获得更好的蛋白质折叠预测结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号