首页> 外文学位 >Network-based information integration for protein function prediction.
【24h】

Network-based information integration for protein function prediction.

机译:基于网络的信息集成,用于蛋白质功能预测。

获取原文
获取原文并翻译 | 示例

摘要

Protein function prediction is a fundamental problem in computational biology. For protein activities described by terms in databases such as the Gene Ontology (GO), this task is typically pursued as a binary classification problem. As a result of an astonishing increase in the available genome-wide protein information, integrating different protein datasets has become a significant opportunity and a major focus to infer functionality. This dissertation contains three novel approaches to integrate popular protein information to classify proteins into functional categories. A probabilistic method, Hierarchical Binomial-Neighborhood (HBN), combining proteins' relational information from the protein-protein interaction (PPI) network, together with the GO hierarchical structure, is proposed first. Results from comparing analogous models on terms from the biological process ontology and genes from the yeast genome show substantial improvement and further analysis illustrates that such an improvement is uniformly consistent with the GO depth. Being aware of the fact that the gene interaction knowledge is still incomplete in most organisms, the second approach we develop is an aggressively integrative probabilistic framework, Probabilistic Hierarchical Inferences for Protein Activity (PHIPA), with improved data usage efficiency, for combining protein relational network, categorical motif and cellular localization information and the GO hierarchy. We implement it on a network extracted from an integrative protein-protein association databases STRING (Search Tool for the Retrieval of Interacting Genes/Proteins ). Being based on Nearest-Neighbor, or the "guilt-by-association" counting principle, both HBN and PHIPA use only the local neighborhood information, and are therefore built on local probabilistic models. In contrast, we develop a third approach, a fully Bayesian network-based auto-probit framework encoding the functional similarity influenced by the network topology. We not only show that the auto-probit model works equally well in prediction as the "local" methods, but also demonstrate its capability of producing more potentially interesting protein predictions by taking advantage of GO annotation uncertainty, which is critical in using and improving the GO database but yet has been ignored by most existing methodologies in this context.
机译:蛋白质功能预测是计算生物学中的一个基本问题。对于用诸如基因本体论(GO)之类的数据库中的术语描述的蛋白质活性,通常将此任务作为二进制分类问题来进行。由于可用的全基因组蛋白质信息的惊人增加,整合不同的蛋白质数据集已成为一个重要的机会,并且是推断功能的主要重点。本论文包含三种新颖的方法来整合流行的蛋白质信息以将蛋白质分类为功能类别。首先提出了一种概率方法,即分层二项式邻域(HBN),它将来自蛋白质-蛋白质相互作用(PPI)网络的蛋白质相关信息与GO层次结构相结合。根据生物学过程本体和酵母基因组的基因对类似模型进行比较的结果显示出实质性的改善,进一步的分析表明,这种改善与GO深度一致。在意识到大多数生物体中的基因相互作用知识仍然不完整这一事实之后,我们开发的第二种方法是一种积极整合的概率框架,即蛋白质活性的概率层次推理(PHIPA),具有改进的数据使用效率,可以结合蛋白质关系网络,分类主题和细胞定位信息以及GO层次结构。我们在从综合蛋白质-蛋白质关联数据库STRING(用于检索相互作用基因/蛋白质的搜索工具)中提取的网络上实施该算法。 HBN和PHIPA基于最近邻或“按罪恶感”计数原理,都仅使用本地邻域信息,因此建立在本地概率模型上。相比之下,我们开发了第三种方法,即完全基于贝叶斯网络的自动位框架,该框架编码受网络拓扑影响的功能相似性。我们不仅表明自动概率模型在预测中的效果与“局部”方法同样出色,而且还证明了其通过利用GO注释不确定性产生更潜在有趣的蛋白质预测的能力,这对于使用和改进遗传算法至关重要。 GO数据库,但是在这种情况下,大多数现有方法都忽略了它。

著录项

  • 作者

    Jiang, Xiaoyu.;

  • 作者单位

    Boston University.;

  • 授予单位 Boston University.;
  • 学科 Statistics.;Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 108 p.
  • 总页数 108
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 统计学;
  • 关键词

  • 入库时间 2022-08-17 11:38:24

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号