首页> 外文期刊>Journal of Bioinformatics and Computational Biology >GENETACK: FRAMESHIFT IDENTIFICATION IN PROTEIN-CODING SEQUENCES BY THE VITERBI ALGORITHM
【24h】

GENETACK: FRAMESHIFT IDENTIFICATION IN PROTEIN-CODING SEQUENCES BY THE VITERBI ALGORITHM

机译:GENETACK:通过维特比算法鉴定蛋白质编码序列中的框架

获取原文
获取原文并翻译 | 示例
           

摘要

We describe a new program for ab initio frameshift detection in protein-coding nucleotide sequences. The task is to distinguish the same strand overlapping ORFs that occur in the sequence due to a presence of a frameshifted gene from the same strand overlapping ORFs that encompass true overlapping or adjacent genes. The GeneTack program uses a hidden Markov model (HMM) of genomic sequence with possibly frameshifted protein-coding regions. The Viterbi algorithm finds the maximum likelihood path that discriminates between true adjacent genes and those adjacent protein-coding regions that just appear to be separate entities due to frameshifts. Therefore, the program can identify spurious predictions made by a conventional gene-finding program misled by a frameshift. We tested GeneTack as well as two earlier developed programs FrameD and FSFind on 17 prokaryotic genomes with frameshifts introduced randomly into known genes. We observed that the average frameshift prediction accuracy of GeneTack, in terms of (Sn + Sp)/2 values, was higher by a significant margin than the accuracy of two other programs. In addition, we observed that the average accuracy of GeneTack is favorably compared with the accuracy of the FSFind-BLAST program that uses protein database search to verify predicted frameshifts, even though GeneTack does not use external evidence. GeneTack is freely available at http://topaz.gatech.edu/GeneTack/.
机译:我们描述了一种新程序,用于从头开始移码检测蛋白编码的核苷酸序列。任务是将序列中由于存在移码基因而出现的同一链重叠ORF与包含真正重叠或相邻基因的同一链重叠ORF区别开来。 GeneTack程序使用基因组序列的隐马尔可夫模型(HMM),其中可能存在移码的蛋白质编码区。维特比算法找到了区分真实相邻基因和那些由于移码而看起来只是独立实体的相邻蛋白质编码区之间最大可能路径。因此,该程序可以识别由移码误导的常规基因发现程序做出的虚假预测。我们在17个原核基因组上测试了GeneTack以及两个较早开发的程序FrameD和FSFind,并将移码随机引入了已知基因。我们观察到,根据(Sn + Sp)/ 2值,GeneTack的平均移码预测准确性比其他两个程序的准确性高出很多。此外,我们观察到GeneTack的平均准确度与FSFind-BLAST程序的准确度相比,后者使用蛋白质数据库搜索来验证预测的移码,即使GeneTack不使用外部证据。 GeneTack可从http://topaz.gatech.edu/GeneTack/免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号