首页> 外文期刊>Journal of Molecular Biology >Statistical alignment: computational properties, homology testing and goodness-of-fit.
【24h】

Statistical alignment: computational properties, homology testing and goodness-of-fit.

机译:统计比对:计算属性,同源性测试和拟合优度。

获取原文
获取原文并翻译 | 示例
           

摘要

The model of insertions and deletions in biological sequences, first formulated by Thorne, Kishino, and Felsenstein in 1991 (the TKF91 model), provides a basis for performing alignment within a statistical framework. Here we investigate this model.Firstly, we show how to accelerate the statistical alignment algorithms several orders of magnitude. The main innovations are to confine likelihood calculations to a band close to the similarity based alignment, to get good initial guesses of the evolutionary parameters and to apply an efficient numerical optimisation algorithm for finding the maximum likelihood estimate. In addition, the recursions originally presented by Thorne, Kishino and Felsenstein can be simplified. Two proteins, about 1500 amino acids long, can be analysed with this method in less than five seconds on a fast desktop computer, which makes this method practical for actual data analysis.Secondly, we propose a new homology test based on this model, where homology means that an ancestor to a sequence pair can be found finitely far back in time. This test has statistical advantages relative to the traditional shuffle test for proteins.Finally, we describe a goodness-of-fit test, that allows testing the proposed insertion-deletion (indel) process inherent to this model and find that real sequences (here globins) probably experience indels longer than one, contrary to what is assumed by the model. Copyright 2000 Academic Press.
机译:生物学序列中的插入和缺失模型由Thorne,Kishino和Felsenstein于1991年首次提出(TKF91模型),为在统计框架内进行比对提供了基础。在这里我们研究这个模型。首先,我们展示如何将统计对齐算法加速几个数量级。主要的创新是将似然性计算限制在接近基于相似度的比对的范围内,以获得进化参数的良好初始猜测,并应用有效的数值优化算法来找到最大似然估计。另外,Thorne,Kishino和Felsenstein最初提出的递归可以简化。该方法可以在一台快速的台式计算机上在不到五秒钟的时间内对两种蛋白质(约1500个氨基酸)进行分析,这使得该方法可用于实际数据分析。同源性意味着序列对的祖先可以在有限的时间上找到。相对于传统的蛋白质混洗测试,该测试具有统计优势。最后,我们描述了拟合优度测试,该测试可以测试该模型固有的拟议的插入删除(indel)过程并找到真实序列(此处为珠蛋白) )可能会经历一个以上的indel,这与模型所假设的相反。版权所有2000学术出版社。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号