...
首页> 外文期刊>BMC Genomics >HIPPI: highly accurate protein family classification with ensembles of HMMs
【24h】

HIPPI: highly accurate protein family classification with ensembles of HMMs

机译:HIPPI:具有HMM集合的高精度蛋白质家族分类

获取原文
           

摘要

Background Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics. Results We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy. Conclusion HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp .
机译:背景技术给定新的生物学序列,检测已知家族中的成员资格是许多生物信息学分析的基本步骤,尤其适用于蛋白质结构和功能预测,宏基因组分类识别和丰度分析。然而,与公共数据库中的序列有远距离关系或片段性的序列家族鉴定仍然是生物信息学中较困难的分析问题之一。结果我们提出了一种新的家庭鉴定技术,称为HIPPI(蛋白质家族鉴定的层次结构隐式马尔可夫模型)。 HIPPI使用一种新颖的技术,通过使用HMMER计算的轮廓隐藏马尔可夫模型的整体来表示给定蛋白质家族或超家族的多序列比对。在Pfam数据库上对HIPPI进行的评估表明,与基于HHsearch的blastp,HMMER和管道相比,HIPPI具有更好的总体精度和召回率,并且即使对于片段查询序列和平均配对序列同一性较低的蛋白质家族,这两种条件都保持了良好的准确性。其他方法的准确性下降。结论HIPPI提供准确的蛋白质家族鉴定,并且对困难的模型条件具有鲁棒性。我们的结果与先前研究的观察结果相结合,表明,与单个概要文件隐马尔可夫模型相比,概要文件隐马尔可夫模型的集合可以更好地表示多个序列比对,从而可以改善各种生物信息任务的下游分析。需要进一步的研究来确定构建轮廓隐马尔可夫模型集合的最佳实践。 HIPPI可从GitHub上的https://github.com/smirarab/sepp获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号