首页> 美国卫生研究院文献>BMC Bioinformatics >Chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming
【2h】

Chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming

机译:染色体结构:将不平等的基因含量和基因旁系同源物的某些问题减少为整数线性规划

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

BackgroundChromosome structure is a very limited model of the genome including the information about its chromosomes such as their linear or circular organization, the order of genes on them, and the DNA strand encoding a gene. Gene lengths, nucleotide composition, and intergenic regions are ignored. Although highly incomplete, such structure can be used in many cases, e.g., to reconstruct phylogeny and evolutionary events, to identify gene synteny, regulatory elements and promoters (considering highly conserved elements), etc. Three problems are considered; all assume unequal gene content and the presence of gene paralogs. The distance problem is to determine the minimum number of operations required to transform one chromosome structure into another and the corresponding transformation itself including the identification of paralogs in two structures. We use the DCJ model which is one of the most studied combinatorial rearrangement models. Double-, sesqui-, and single-operations as well as deletion and insertion of a chromosome region are considered in the model; the single ones comprise cut and join. In the reconstruction problem, a phylogenetic tree with chromosome structures in the leaves is given. It is necessary to assign the structures to inner nodes of the tree to minimize the sum of distances between terminal structures of each edge and to identify the mutual paralogs in a fairly large set of structures. A linear algorithm is known for the distance problem without paralogs, while the presence of paralogs makes it NP-hard. If paralogs are allowed but the insertion and deletion operations are missing (and special constraints are imposed), the reduction of the distance problem to integer linear programming is known. Apparently, the reconstruction problem is NP-hard even in the absence of paralogs. The problem of contigs is to find the optimal arrangements for each given set of contigs, which also includes the mutual identification of paralogs.
机译:背景技术染色体结构是一个非常有限的基因组模型,包括有关其染色体的信息,例如其线性或环状组织,基因在其上的顺序以及编码基因的DNA链。基因长度,核苷酸组成和基因间区域被忽略。尽管高度不完整,但是这种结构可以在许多情况下使用,例如,重建系统发育和进化事件,鉴定基因同调,调节元件和启动子(考虑高度保守的元件)等。所有假设基因含量不相等且存在基因旁系同源物。距离问题是确定将一个染色体结构转化为另一种染色体结构所需的最少操作数,以及相应的转化本身,包括在两个结构中鉴定旁系同源物。我们使用DCJ模型,它是研究最多的组合重排模型之一。在模型中考虑了两次,倍半和单次操作以及染色体区域的缺失和插入。单个包括剪切和连接。在重建问题中,给出了在叶片中具有染色体结构的系统发育树。必须将结构分配给树的内部节点,以最小化每个边缘的末端结构之间的距离之和,并在相当大的一组结构中识别相互的旁系同源物。线性算法因距离问题而无旁系同源物而闻名,而旁系同源物的存在使其变得难解NP。如果允许paralog,但是缺少插入和删除操作(并且施加了特殊约束),则将距离问题简化为整数线性规划是已知的。显然,即使没有旁系同源物,重建问题也是NP难题。重叠群的问题是找到每个给定重叠群的最佳安排,这还包括相互鉴定旁系同源物。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号