...
首页> 外文期刊>Journal of Biomolecular Structure and Dynamics >Reduced alphabet motif methodology for GPCR annotation.
【24h】

Reduced alphabet motif methodology for GPCR annotation.

机译:用于GPCR注释的简化字母图案方法。

获取原文
获取原文并翻译 | 示例
           

摘要

Identification and Classification of G-protein coupled receptors (GPCRs) using protein sequences is an important computational challenge, given that experimental screening of thousands of ligands is an expensive proposition. There are two distinct but complementary approaches to GPCR classification --machine learning and sequence motif analysis. Machine learning methodologies typically suffer from problems of class imbalance and lack of multi-class classification. Many sequence motif methods, meanwhile, are too dependent on the similarity of the primary sequence alignments. It is desirable to have a motif discovery and application methodology that is not strongly dependent on primary sequence similarity. It should also overcome limitations of machine learning. We propose and evaluate the effectiveness of a simple methodology that uses a reduced protein functional alphabet representation, where similar functional residues have similar symbols. Regular expression motifs can then be obtained by ClustalW based multiple sequence alignment, using an identity matrix. Since evolutionary matrices like BLOSUM, PAM are not used, this method can be useful for any set of sequences that do not necessarily share a common ancestry. Reduced alphabet motifs can accurately classify known GPCR proteins and the results are comparable to PRINTS and PROSITE. For well known GPCR proteins from SWISSPROT, there were no false negatives and only a few false positives. This methodology covers most currently known classes of GPCRs, even if there are very few representative sequences. It also predicts more than one class for certain sequences, thus overcoming the limitation of machine learning methods. We also annotated, 695 orphan receptors, and 121 were identified as belonging to Family A. A simple JavaScript based web interface has been developed to predict GPCR families and subfamilies (www.insilico-consulting.com/gpcrmotif.html).
机译:考虑到数千种配体的实验筛选是一项昂贵的提议,使用蛋白质序列鉴定和分类G蛋白偶联受体(GPCR)是一项重要的计算挑战。 GPCR分类有两种不同但互补的方法-机器学习和序列基序分析。机器学习方法通​​常会遇到类不平衡和缺乏多类分类的问题。同时,许多序列基序方法太依赖于一级序列比对的相似性。期望具有不强烈依赖于一级序列相似性的基序发现和应用方法。它还应克服机器学习的局限性。我们提出并评估使用减少的蛋白质功能字母表示的简单方法的有效性,其中相似的功能残基具有相似的符号。然后可以使用同一矩阵通过基于ClustalW的多序列比对获得正则表达基序。由于未使用诸如BLOSUM,PAM之类的进化矩阵,因此该方法可用于不一定共享共同祖先的任何序列集。减少的字母图案可以准确地对已知的GPCR蛋白进行分类,其结果与PRINTS和PROSITE相当。对于来自SWISSPROT的众所周知的GPCR蛋白,没有假阴性,只有少数假阳性。即使很少有代表性序列,该方法学也涵盖了大多数当前已知的GPCR类。对于某些序列,它还可以预测一个以上的类,从而克服了机器学习方法的局限性。我们还注释了695个孤儿受体和121个孤儿,它们属于家族A。已经开发了基于JavaScript的简单Web界面来预测GPCR家族和亚家族(www.insilico-consulting.com/gpcrmotif.html)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号