首页> 美国卫生研究院文献>Scientific Reports >Exploring Approaches for Detecting Protein Functional Similarity within an Orthology-based Framework
【2h】

Exploring Approaches for Detecting Protein Functional Similarity within an Orthology-based Framework

机译:探索基于正交学的框架内检测蛋白质功能相似性的方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Protein functional similarity based on gene ontology (GO) annotations serves as a powerful tool when comparing proteins on a functional level in applications such as protein-protein interaction prediction, gene prioritization, and disease gene discovery. Functional similarity (FS) is usually quantified by combining the GO hierarchy with an annotation corpus that links genes and gene products to GO terms. One large group of algorithms involves calculation of GO term semantic similarity (SS) between all the terms annotating the two proteins, followed by a second step, described as “mixing strategy”, which involves combining the SS values to yield the final FS value. Due to the variability of protein annotation caused e.g. by annotation bias, this value cannot be reliably compared on an absolute scale. We therefore introduce a similarity z-score that takes into account the FS background distribution of each protein. For a selection of popular SS measures and mixing strategies we demonstrate moderate accuracy improvement when using z-scores in a benchmark that aims to separate orthologous cases from random gene pairs and discuss in this context the impact of annotation corpus choice. The approach has been implemented in Frela, a fast high-throughput public web server for protein FS calculation and interpretation.
机译:当在蛋白质-蛋白质相互作用预测,基因优先级和疾病基因发现等应用中在功能水平上比较蛋白质时,基于基因本体(GO)注释的蛋白质功能相似性可作为强大的工具。功能相似性(FS)通常是通过将GO层次结构与将基因和基因产物链接到GO术语的注释语料库结合在一起来量化的。一大类算法涉及注释所有两种蛋白质的所有术语之间的GO术语语义相似度(SS)的计算,然后是第二步,称为“混合策略”,其中涉及组合SS值以产生最终的FS值。由于蛋白质注释的可变性导致例如由于注释偏差,无法在绝对比例上可靠地比较此值。因此,我们引入了一个相似性z评分,该评分考虑了每种蛋白质的FS背景分布。对于一些流行的SS量度和混合策略的选择,我们在基准中使用z得分时表现出了中等程度的准确性提高,该基准旨在将直系同源病例与随机基因对分开,并在此情况下讨论注释语料库选择的影响。该方法已在Frela(一种用于蛋白质FS计算和解释的快速高通量公共Web服务器)中实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号