...
首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Comparing Genomes with Duplications: A Computational Complexity Point of View
【24h】

Comparing Genomes with Duplications: A Computational Complexity Point of View

机译:比较基因组与重复:计算复杂性的观点

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we are interested in the computational complexity of computing (dis)similarity measures between two genomes when they contain duplicated genes or genomic markers, a problem that happens frequently when comparing whole nuclear genomes. Recently, several methods ( [1], [2]) have been proposed that are based on two steps to compute a given (dis)similarity measure M between two genomes G_1 and G_2: first, one establishes a oneto- one correspondence between genes of G_1 and genes of G_2 ; second, once this correspondence is established, it defines explicitly a permutation and it is then possible to quantify their similarity using classical measures defined for permutations, like the number of breakpoints. Hence these methods rely on two elements: a way to establish a one-to-one correspondence between genes of a pair of genomes, and a (dis)similarity measure for permutations. The problem is then, given a (dis)similarity measure for permutations, to compute a correspondence that defines an optimal permutation for this measure. We are interested here in two models to compute a one-to-one correspondence: the exemplar model, where all but one copy are deleted in both genomes for each gene family, and the matching model, that computes a maximal correspondence for each gene family. We show that for these two models, and for three (dis)similarity measures on permutations, namely the number of common intervals, the maximum adjacency disruption (MAD) number and the summed adjacency disruption (SAD) number, the problem of computing an optimal correspondence is NP-complete, and even APXhard for the MAD number and SAD number.
机译:在本文中,我们对两个基因组包含重复的基因或基因组标记时计算(非)相似性度量的计算复杂性感兴趣,这是在比较整个核基因组时经常发生的问题。最近,已经提出了几种方法[[1],[2]),该方法基于两个步骤来计算两个基因组G_1和G_2之间的给定(不相似)度量M:首先,一个方法在基因之间建立一对一的对应关系。 G_1和G_2的基因;其次,一旦建立了这种对应关系,就可以明确定义一个排列,然后可以使用为排列定义的经典度量(如断点数)来量化它们的相似性。因此,这些方法依赖于两个要素:一种在一对基因组的基因之间建立一对一对应关系的方法,以及一种用于排列的(不相似)度量。然后,给定问题的(非)相似性度量问题,以计算为该度量定义最佳置换的对应关系。我们在这里感兴趣的是两个模型来计算一对一的对应关系:示例模型,其中每个基因家族的两个基因组中都删除了一个拷贝,但所有拷贝都被删除;以及匹配模型,该模型计算了每个基因家族的最大对应关系。我们表明,对于这两个模型,以及对于排列的三个(不相似)度量,即公共区间数,最大邻接破坏(MAD)数和总邻接破坏(SAD)数,计算最优值的问题对应关系是NP完整的,甚至是MAX编号和SAD编号的APXhard。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号