...
首页> 外文期刊>Journal of Theoretical Biology >Remote protein homology detection using recurrence quantification analysis and amino acid physicochemical properties.
【24h】

Remote protein homology detection using recurrence quantification analysis and amino acid physicochemical properties.

机译:使用递归定量分析和氨基酸理化特性进行远程蛋白质同源性检测。

获取原文
获取原文并翻译 | 示例
           

摘要

Remote homology detection refers to the detection of structure homology in evolutionarily related proteins with low sequence similarity. Supervised learning algorithms such as support vector machine (SVM) are currently the most accurate methods. In most of these SVM-based methods, efforts have been dedicated to developing new kernels to better use the pairwise alignment scores or sequence profiles. Moreover, amino acids' physicochemical properties are not generally used in the feature representation of protein sequences. In this article, we present a remote homology detection method that incorporates two novel features: (1) a protein's primary sequence is represented using amino acid's physicochemical properties and (2) the similarity between two proteins is measured using recurrence quantification analysis (RQA). An optimization scheme was developed to select different amino acid indices (up to 10 for a protein family) that are best to characterize the given protein family. The selected amino acid indices may enable us to draw better biological explanation of the protein family classification problem than using other alignment-based methods. An SVM-based classifier will then work on the space described by the RQA metrics. The classification scheme is named as SVM-RQA. Experiments at the superfamily level of the SCOP1.53 dataset show that, without using alignment or sequence profile information, the features generated from amino acid indices are able to produce results that are comparable to those obtained by the published state-of-the-art SVM kernels. In the future, better prediction accuracies can be expected by combining the alignment-based features with our amino acids property-based features. Supplementary information including the raw dataset, the best-performing amino acid indices for each protein family and the computed RQA metrics for all protein sequences can be downloaded from http://ym151113.ym.edu.tw/svm-rqa.
机译:远程同源性检测是指在具有低序列相似性的进化相关蛋白中检测结构同源性。诸如支持向量机(SVM)之类的监督学习算法是目前最准确的方法。在大多数基于SVM的方法中,人们一直致力于开发新内核,以更好地使用逐对比对得分或序列概况。而且,氨基酸的理化性质通常不用于蛋白质序列的特征表示中。在本文中,我们提出了一种远程同源性检测方法,该方法具有两个新功能:(1)使用氨基酸的理化特性表示蛋白质的一级序列,以及(2)使用递归定量分析(RQA)测量两种蛋白质之间的相似性。开发了一种优化方案以选择最能表征给定蛋白质家族的不同氨基酸指数(一个蛋白质家族最多10个)。与使用其他基于比对的方法相比,选择的氨基酸索引可能使我们能够对蛋白质家族分类问题进行更好的生物学解释。然后,基于SVM的分类器将在RQA指标描述的空间上工作。分类方案称为SVM-RQA。在SCOP1.53数据集的超家族水平上进行的实验表明,在不使用比对或序列概况信息的情况下,由氨基酸索引生成的特征能够产生与已发布的最新技术相当的结果SVM内核。将来,通过将基于比对的特征与我们基于氨基酸特性的特征相结合,可以期待更好的预测准确性。可以从http://ym151113.ym.edu.tw/svm-rqa下载包括原始数据集,每个蛋白质家族表现最佳的氨基酸指数以及所有蛋白质序列的RQA度量标准在内的补充信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号