首页> 美国卫生研究院文献>Nucleic Acids Research >Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements
【2h】

Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements

机译:提高PSI-BLAST的准确性 蛋白质数据库搜索基于成分的统计信息和 其他改进

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

PSI-BLAST is an iterative program to search a database for proteins with distant similarity to a query sequence. We investigated over a dozen modifications to the methods used in PSI-BLAST, with the goal of improving accuracy in finding true positive matches. To evaluate performance we used a set of 103 queries for which the true positives in yeast had been annotated by human experts, and a popular measure of retrieval accuracy (ROC) that can be normalized to take on values between 0 (worst) and 1 (best). The modifications we consider novel improve the ROC score from 0.758 ± 0.005 to 0.895 ± 0.003. This does not include the benefits from four modifications we included in the ‘baseline’ version, even though they were not implemented in PSI-BLAST version 2.0. The improvement in accuracy was confirmed on a small second test set. This test involved analyzing three protein families with curated lists of true positives from the non-redundant protein database. The modification that accounts for the majority of the improvement is the use, for each database sequence, of a position-specific scoring system tuned to that sequence’s amino acid composition. The use of composition-based statistics is particularly beneficial for large-scale automated applications of PSI-BLAST.
机译:PSI-BLAST是一个迭代程序,可在数据库中搜索与查询序列具有相似性的蛋白质。我们研究了对PSI-BLAST中使用的方法的十几种修改,目的是提高寻找真正阳性匹配的准确性。为了评估性能,我们使用了103条查询,其中酵母的真实阳性已经由人类专家注释,并且采用了一种流行的衡量检索准确性(ROC)的方法,可以将其归一化为0(最差)和1(最好)。我们认为新颖的修改将ROC分数从0.758±0.005提高到0.895±0.003。这不包括我们在“基准”版本中进行的四次修改所带来的好处,即使这些修改未在PSI-BLAST 2.0版中实现。在一个小的第二测试装置上就证实了准确性的提高。该测试涉及使用非冗余蛋白质数据库中精选的真实阳性列表分析三个蛋白质家族。大部分改进的修改是针对每个数据库序列,使用特定于位置的评分 系统调整到该序列的氨基酸组成。 使用基于组合的统计数据特别有益 适用于PSI-BLAST的大规模自动化应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号