【24h】

Predicting protein-binding RNA nucleotides with consideration of binding partners

机译:考虑结合伴侣预测蛋白质结合RNA核苷酸

获取原文
获取原文并翻译 | 示例
           

摘要

In recent years several computational methods have been developed to predict RNA-binding sites in protein. Most of these methods do not consider interacting partners of a protein, so they predict the same RNA-binding sites for a given protein sequence even if the protein binds to different RNAs. Unlike the problem of predicting RNA-binding sites in protein, the problem of predicting protein-binding sites in RNA has received little attention mainly because it is much more difficult and shows a lower accuracy on average. In our previous study, we developed a method that predicts protein-binding nucleotides from an RNA sequence. In an effort to improve the prediction accuracy and usefulness of the previous method, we developed a new method that uses both RNA and protein sequence data. In this study, we identified effective features of RNA and protein molecules and developed a new support vector machine (SVM) model to predict protein-binding nucleotides from RNA and protein sequence data. The new model that used both protein and RNA sequence data achieved a sensitivity of 86.5%, a specificity of 86.2%, a positive predictive value (PPV) of 72.6%, a negative predictive value (NPV) of 93.8% and Matthews correlation coefficient (MCC) of 0.69 in a 10-fold cross validation; it achieved a sensitivity of 58.8%, a specificity of 87.4%, a PPV of 65.1%, a NPV of 84.2% and MCC of 0.48 in independent testing. For comparative purpose, we built another prediction model that used RNA sequence data alone and ran it on the same dataset. In a 10 fold-cross validation it achieved a sensitivity of 85.7%, a specificity of 80.5%, a PPV of 67.7%, a NPV of 92.2% and MCC of 0.63; in independent testing it achieved a sensitivity of 67.7%, a specificity of 78.8%, a PPV of 57.6%, a NPV of 85.2% and MCC of 0.45. In both cross-validations and independent testing, the new model that used both RNA and protein sequences showed a better performance than the model that used RNA sequence data alone in most performance measures. To the best of our knowledge, this is the first sequence-based prediction of protein-binding nucleotides in RNA which considers the binding partner of RNA. The new model will provide valuable information for designing biochemical experiments to find putative protein-binding sites in RNA with unknown structure. (C) 2015 Elsevier Ireland Ltd. All rights reserved.
机译:近年来,已经开发了几种计算方法来预测蛋白质中的RNA结合位点。这些方法大多数都不考虑蛋白质的相互作用伴侣,因此即使蛋白质与不同的RNA结合,它们也可以预测给定蛋白质序列的相同RNA结合位点。与预测蛋白质中的RNA结合位点的问题不同,预测RNA中的蛋白质结合位点的问题很少受到关注,这主要是因为它困难得多并且平均显示出较低的准确性。在我们以前的研究中,我们开发了一种从RNA序列预测蛋白质结合核苷酸的方法。为了提高先前方法的预测准确性和实用性,我们开发了一种同时使用RNA和蛋白质序列数据的新方法。在这项研究中,我们确定了RNA和蛋白质分子的有效特征,并开发了一种新的支持向量机(SVM)模型,可从RNA和蛋白质序列数据预测蛋白质结合核苷酸。使用蛋白质和RNA序列数据的新模型的灵敏度为86.5%,特异性为86.2%,阳性预测值(PPV)为72.6%,阴性预测值(NPV)为93.8%,马修斯相关系数( MCC)为0.69(十倍交叉验证);在独立测试中,它的灵敏度为58.8%,特异性为87.4%,PPV为65.1%,NPV为84.2%,MCC为0.48。为了进行比较,我们建立了另一个预测模型,该预测模型仅使用RNA序列数据并将其运行在同一数据集上。在10倍交叉验证中,其灵敏度为85.7%,特异性为80.5%,PPV为67.7%,NPV为92.2%,MCC为0.63。在独立测试中,它的灵敏度为67.7%,特异性为78.8%,PPV为57.6%,NPV为85.2%,MCC为0.45。在交叉验证和独立测试中,在大多数性能指标中,同时使用RNA和蛋白质序列的新模型显示出比仅使用RNA序列数据的模型更好的性能。据我们所知,这是RNA中第一个基于序列的蛋白质结合核苷酸预测,其中考虑了RNA的结合伴侣。新模型将为设计生化实验提供有价值的信息,以发现结构未知的RNA中假定的蛋白质结合位点。 (C)2015 Elsevier Ireland Ltd.保留所有权利。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号