首页> 外文会议>Pacific Symposium on Biocomputing(PSB); 20050104-08; Hawaii,HI(US) >GOTREES: PREDICTING GO ASSOCIATIONS FROM PROTEIN DOMAIN COMPOSITION USING DECISION TREES
【24h】

GOTREES: PREDICTING GO ASSOCIATIONS FROM PROTEIN DOMAIN COMPOSITION USING DECISION TREES

机译:GOTREES:使用决策树从蛋白质域组成预测去关联

获取原文
获取原文并翻译 | 示例

摘要

The Gene Ontology (GO) offers a comprehensive and standardized way to describe a protein's biological role. Proteins are annotated with GO terms based on direct or indirect experimental evidence. Term assignments are also inferred from homology and literature mining. Regardless of the type of evidence used, GO assignments are manually curated or electronic. Unfortunately, manual curation cannot keep pace with the data, available from publications and various large experimental datasets. Automated literature-based annotation methods have been developed in order to speed up the annotation. However, they only apply to proteins that have been experimentally investigated or have close homologs with sufficient and consistent annotation. One of the homology-based electronic methods for GO annotation is provided by the InterPro database. The InterPro2GO/PFAM2GO associates individual protein domains with GO terms and thus can be used to annotate the less studied proteins. However, protein classification via a single functional domain demands stringency to avoid large number of false positives. This work broadens the basic approach. We model proteins via their entire functional domain content and train individual decision tree classifiers for each GO term using known protein assignments. We demonstrate that our approach is sensitive, specific and precise, as well as fairly robust to sparse data. We have found that our method is more sensitive when compared to the InterPro2GO performance and suffers only some precision decrease. In comparison to the InterPro2GO we have improved the sensitivity by 22%, 27% and 50% for Molecular Function, Biological Process and Cellular GO terms respectively.
机译:基因本体论(GO)提供了一种全面,标准化的方法来描述蛋白质的生物学作用。根据直接或间接的实验证据,用GO术语标注蛋白质。术语分配也可以从同源性和文献挖掘中推断出来。无论使用哪种证据,GO作业都是手动安排的或电子的。不幸的是,人工策展无法跟上出版物和各种大型实验数据集所提供的数据。为了加速注释,已经开发了基于文献的自动注释方法。但是,它们仅适用于经过实验研究的蛋白或具有足够且一致注释的紧密同源物的蛋白。 InterPro数据库提供了一种用于GO注释的基于同源性的电子方法。 InterPro2GO / PFAM2GO将单个蛋白质结构域与GO术语相关联,因此可用于注释研究较少的蛋白质。但是,通过单个功能域进行蛋白质分类需要严格,以避免大量的假阳性。这项工作拓宽了基本方法。我们通过蛋白质的整个功能域内容对蛋白质进行建模,并使用已知的蛋白质分配为每个GO项训练单独的决策树分类器。我们证明了我们的方法是灵敏,特定和精确的,并且对于稀疏数据相当健壮。我们发现,与InterPro2GO的性能相比,我们的方法更为灵敏,并且只降低了一些精度。与InterPro2GO相比,我们对分子功能,生物过程和细胞GO术语的灵敏度分别提高了22%,27%和50%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号