【24h】

Network-based auto-probit modeling for protein function prediction.

机译:基于网络的自动位模型,用于蛋白质功能预测。

获取原文
获取原文并翻译 | 示例
           

摘要

Predicting the functional roles of proteins based on various genome-wide data, such as protein-protein association networks, has become a canonical problem in computational biology. Approaching this task as a binary classification problem, we develop a network-based extension of the spatial auto-probit model. In particular, we develop a hierarchical Bayesian probit-based framework for modeling binary network-indexed processes, with a latent multivariate conditional autoregressive Gaussian process. The latter allows for the easy incorporation of protein-protein association network topologies-either binary or weighted-in modeling protein functional similarity. We use this framework to predict protein functions, for functions defined as terms in the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functionality. Furthermore, we show how a natural extension of this framework can be used to model and correct for the high percentage of false negative labels in training data derived from GO, a serious shortcoming endemic to biological databases of this type. Our method performance is evaluated and compared with standard algorithms on weighted yeast protein-protein association networks, extracted from a recently developed integrative database called Search Tool for the Retrieval of INteracting Genes/proteins (STRING). Results show that our basic method is competitive with these other methods, and that the extended method-incorporating the uncertainty in negative labels among the training data-can yield nontrivial improvements in predictive accuracy.
机译:基于各种全基因组数据(例如蛋白质-蛋白质关联网络)预测蛋白质的功能作用,已成为计算生物学中的一个典型问题。将此任务作为二进制分类问题进行处理,我们开发了基于网络的空间自动位模型的扩展。特别是,我们开发了一个基于贝叶斯Probits的分层框架,用于建模具有潜在的多元条件自回归高斯过程的二进制网络索引过程。后者允许容易地合并蛋白质-蛋白质缔合网络拓扑结构(二进制或加权模型蛋白质功能相似性)。我们使用此框架来预测蛋白质功能,即在基因本体(GO)数据库中定义为术语的功能,这是一种流行的严格的生物学功能词汇。此外,我们展示了如何使用此框架的自然扩展来建模和校正源自GO的训练数据中高百分比的假阴性标签,这是此类生物学数据库的一个严重缺陷。我们的方法性能得到了评估,并与加权酵母蛋白质-蛋白质关联网络上的标准算法进行了比较,该算法是从最近开发的称为检索基因/蛋白质的检索工具(STRING)的综合数据库中提取的。结果表明,我们的基本方法与其他方法相比具有竞争优势,并且扩展方法(将训练数据中的负标签不确定性纳入其中)可以大大提高预测准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号