...
首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Profile Hidden Markov Models Are Not Identifiable
【24h】

Profile Hidden Markov Models Are Not Identifiable

机译:配置文件隐藏马尔可夫模型不可识别

获取原文
获取原文并翻译 | 示例
           

摘要

Profile Hidden Markov Models (HMMs) are graphical models that can be used to produce finite length sequences from a distribution. In fact, although they were only introduced for bioinformatics 25 years ago (by Haussler et al., Hawaii International Conference on Systems Science, 1993), they are arguably the most commonly used statistical model in bioinformatics, with multiple applications, including protein structure and function prediction, classifications of novel proteins into existing protein families and superfamilies, metagenomics, and multiple sequence alignment. The standard use of profile HMMs in bioinformatics has two steps: first a profile HMM is built for a collection of molecular sequences (which may not be in a multiple sequence alignment), and then the profile HMM is used in some subsequent analysis of new molecular sequences. The construction of the profile thus is itself a statistical estimation problem, since any given set of sequences might potentially fit more than one model well. Hence, a basic question about profile HMMs is whether they are statistically identifiable, which means that no two profile HMMs can produce the same distribution on finite length sequences. Indeed, statistical identifiability is a fundamental aspect of any statistical model, and yet it is not known whether profile HMMs are statistically identifiable. In this paper, we report on preliminary results towards characterizing the statistical identifiability of profile HMMs in one of the standard forms used in bioinformatics.
机译:配置文件隐藏马尔可夫模型(HMMS)是可用于从分布产生有限长度序列的图形模型。事实上,虽然25年前仅为生物信息学(Haussler等人),但是,夏威夷国际系统科学会议,1993年),它们可以说是生物信息学中最常用的统计模型,具有多种应用,包括蛋白质结构和蛋白质结构功能预测,新型蛋白质分类成现有蛋白质家族和超小心,偏心神经和多序列对齐。在生物信息学中的标准使用简档HMMS具有两个步骤:首先,构建概况HMM,用于集合的分子序列(可能不是多序列对准),然后在一些后续分析中使用谱HMM序列。轮廓的构造本身本身是统计估计问题,因为任何给定的一组序列都可能符合多个模型。因此,关于简介HMMS的基本问题是它们是否在统计上识别,这意味着没有两个简档HMMS可以在有限长度序列上产生相同的分布。实际上,统计可识别是任何统计模型的基本方面,但它尚不知道配置文件HMMS是否在统计上可识别。在本文中,我们报告了在生物信息学中使用的标准形式之一中表征概况HMMS的统计标识的初步结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号