首页> 外文会议>Pacific Symposium on Biocomputing >GOTREES: PREDICTING GO ASSOCIATIONS FROM PROTEIN DOMAIN COMPOSITION USING DECISION TREES
【24h】

GOTREES: PREDICTING GO ASSOCIATIONS FROM PROTEIN DOMAIN COMPOSITION USING DECISION TREES

机译:Gotrees:预测使用决定树从蛋白质结构域组成的GO关联

获取原文

摘要

The Gene Ontology (GO) offers a comprehensive and standardized way to describe a protein's biological role. Proteins are annotated with GO terms based on direct or indirect experimental evidence. Term assignments are also inferred from homology and literature mining. Regardless of the type of evidence used, GO assignments are manually curated or electronic. Unfortunately, manual curation cannot keep pace with the data, available from publications and various large experimental datasets. Automated literature-based annotation methods have been developed in order to speed up the annotation. However, they only apply to proteins that have been experimentally investigated or have close homologs with sufficient and consistent annotation. One of the homology-based electronic methods for GO annotation is provided by the InterPro database. The InterPro2GO/PFAM2GO associates individual protein domains with GO terms and thus can be used to annotate the less studied proteins. However, protein classification via a single functional domain demands stringency to avoid large number of false positives. This work broadens the basic approach. We model proteins via their entire functional domain content and train individual decision tree classifiers for each GO term using known protein assignments. We demonstrate that our approach is sensitive, specific and precise, as well as fairly robust to sparse data. We have found that our method is more sensitive when compared to the InterPro2GO performance and suffers only some precision decrease. In comparison to the InterPro2GO we have improved the sensitivity by 22%, 27% and 50% for Molecular Function, Biological Process and Cellular GO terms respectively.
机译:基因本体(GO)提供了一种描述蛋白质的生物学作用的全面和标准化的方法。蛋白质基于直接或间接实验证据用GO术语进行注释。术语任务也从同源性和文献挖掘推断出来。无论使用的证据类型如何,都是手动策划或电子的作业。不幸的是,手动策策无法与来自出版物和各种大型实验数据集的数据保持步伐。已经开发了自动化的基于文献的注释方法,以加快注释。然而,它们仅适用于经过实验研究或具有足够的近同源物的蛋白质,具有足够的兼容的注释。 Tropero数据库提供了用于Go注释的基于同源的电子方法之一。 Transo2Go / Pfam2Go将单个蛋白质结构域与GO术语相关联,因此可用于注释较少的研究蛋白质。然而,通过单一功能域的蛋白质分类要求严格性以避免大量的误报。这项工作拓宽了基本方法。我们通过其整个功能域内容模拟蛋白质,并使用已知的蛋白质分配培训每个GO期的单个决策树分类器。我们证明我们的方法是敏感的,具体和精确的,以及对稀疏数据相当强大。与Interpro2Go性能相比,我们发现我们的方法更敏感,只有一些精度减少。与Acresto2Go相比,我们分别改善了分子功能,生物过程和细胞GO条款的22%,27%和50%的敏感性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号