首页> 外文学位 >Efficient Combinatorial Algorithms for Problems in Sequence and Self Assembly.
【24h】

Efficient Combinatorial Algorithms for Problems in Sequence and Self Assembly.

机译:序列和自组装问题的有效组合算法。

获取原文
获取原文并翻译 | 示例

摘要

In this thesis we present efficient algorithms for various combinatorial problems arising in sequence assembly and self assembly . Sequence assembly is a major phase in uncovering the genomic sequence of an organism. Sequence assembly has several underlying combinatorial problems on bi-directed de Bruijn graphs. Existing algorithms to build and operate on these graphs cannot scale with ever increasing volume of sequence assembly data. In this thesis we close this gap by providing efficient algorithms build and operate on bi-directed de Bruijn graphs. We first show how a bi-directed de Bruijn graph can be constructed optimally in theta(n) time in contrast to the existing [ZB08] theta(n log( G)) algorithm, here n is the input size and G is the size of genome. This algorithm is also I/O optimal and requires theta( logn/M logM/B ) I/Os to build the graph, here M is the main memory size and B is the block size. Secondly we show that we can solve the Chinese Postman walk Problem on a bi-directed graph without reducing it to bi-directed flow problem. This bi-directed flow based algorithm [MGMB07] to solve the CPP on a bi-directed graph G( V,E) takes O(|E|2 log2(V)) time. We show that we can improve this algorithm to theta(p(|V| + | E|) log(|V|) + (dmaxp) 3), here p = max{|{nu|din(nu) - dout(nu) > 0}|, |{nu|din(nu) - dout(nu) 0}|} and dmax = max{|din(nu) - dout(nu)|}. This algorithm performs asymptotically better than the bi-directed flow algorithm when the number of imbalanced nodes p is much less than the nodes in the bi-directed graph.;On the other hand self assembly systems have numerous critical applications in medicine, circuit design. Theoretical modeling of self assembly is very useful before performing self assembly experiments. Algorithmic self assembly studies the efficiency of self assembly systems on an abstract two dimensional (2D) tile assembly model (TAM). The theory behind TAM is based on Wang's tiling technique, TAM has the power to simulate a turing machine. Algorithms with an optimal tile complexity of (theta( logN loglogN )) were proposed earlier to uniquely self assemble an N x N square (with a temperature of alpha = 2) on TAM. However efficient algorithms (tile set constructions) to assemble arbitrary shapes on TAM are not known and have remained open. In this thesis we try to bridge this gap by presenting algorithms which can self assemble some regular polygons with a tile complexity of theta(log(N)), here N is the area of the underlying polygon. In a deterministic self assembly model such as TAM, it has been proven that the tile complexity lower bound to self assembly any shape is theta( logNloglog N ) (inferred from the Kolmogrov complexity), here N is the area of the underlying shape. However designing even theta( logN loglogN ) unique tiles specific to a shape which needs to be self assembled is still an intensive task. Creating a copy of a tile is much simpler than creating a unique tile. With this constraint in mind probabilistic tile assembly models (PTAM) were introduced---these models are also referred as concentration programming models or randomized self assembly models. These systems have O(1) tile complexity and the concentration of each of the tiles can be varied to produce the desired shape. Existing algorithms [KS08] [Dot09] on PTAM suffer from large underlying constant, this is because all these algorithms adopt sub-tiles which perform binary arithmetic. In contrast to the existing algorithms, in this thesis we show that its possible to self assemble rectilinear shapes on PTAM without using any sub-tiles performing binary arithmetic; We introduce a technique called staircase sampling which can self assemble squares, rectangles and rectangles with constant aspect ratio with high probability (i.e. O(1 - 1/nalpha), for any fixed alpha > 0), here n is the dimension of the shape which needs to be self assembled.
机译:本文针对序列组装和自组装中出现的各种组合问题,提出了有效的算法。序列组装是揭示生物体基因组序列的主要阶段。序列组装在双向de Bruijn图上具有几个潜在的组合问题。现有的用于在这些图上构建和操作的算法无法随着序列装配数据量的不断增长而扩展。在本文中,我们通过提供高效的算法并在双向de Bruijn图上运行来弥补这一差距。我们首先展示与现有的[ZB08] theta(n log(G))算法相比,如何在theta(n)时间内最佳构造双向de Bruijn图,其中n是输入大小,G是大小基因组。该算法也是I / O最佳的,并且需要theta(logn / M logM / B)I / O来构建图形,这里M是主内存大小,B是块大小。其次,我们表明可以在双向图上解决中国邮递员步行问题,而不必将其简化为双向流问题。用于解决双向图G(V,E)上CPP的这种基于双向流的算法[MGMB07]花费O(| E | 2 log2(V))时间。我们证明了我们可以将该算法改进为theta(p(| V | + | E |)log(| V |)+(dmaxp)3),这里p = max {| {nu | din(nu)-dout( nu)> 0} |,| {nu | din(nu)-dout(nu)<0} |}和dmax = max {| din(nu)-dout(nu)|}。当不平衡节点的数量p远小于双向图中的节点时,该算法的渐近性优于双向流算法。另一方面,自组装系统在医学,电路设计中具有许多关键应用。在执行自组装实验之前,自组装的理论建模非常有用。算法自组装研究了抽象二维(2D)瓷砖组装模型(TAM)上自组装系统的效率。 TAM背后的理论基于Wang的切片技术,TAM可以模拟图灵机。较早提出了具有最佳图块复杂度(theta(logN loglogN))的算法,以在TAM上唯一地自组装N x N正方形(温度为alpha = 2)。但是,尚不知道在TAM上组装任意形状的有效算法(平铺集构造),并且仍处于开放状态。在本文中,我们尝试通过提出一些算法来弥合这种差距,该算法可以自组装一些规则多边形,其瓦片复杂度为theta(log(N)),其中N是基础多边形的面积。在确定性自组装模型(如TAM)中,已证明,对任何形状进行自组装的下层复杂度下限为theta(logNloglog N)(由Kolmogrov复杂度推断),此处N为基础形状的面积。然而,即使是特定于需要自行组装的形状的theta(logN loglogN)唯一瓷砖的设计仍然是一项艰巨的任务。创建图块的副本比创建唯一图块简单得多。考虑到这一限制,引入了概率砖块装配模型(PTAM),这些模型也称为浓度编程模型或随机自组装模型。这些系统具有O(1)个图块复杂度,并且每个图块的浓度都可以变化以产生所需的形状。 PTAM上的现有算法[KS08] [Dot09]具有较大的基础常数,这是因为所有这些算法均采用执行二进制算术的子区块。相对于现有算法,本文证明了在不使用任何子块执行二进制算术的情况下,在PTAM上自组装直线形状的可能性。我们引入了一种称为阶梯采样的技术,该技术可以以高概率(即,对于任何固定的alpha> 0的O(1-1 / nalpha),对于高固定比例的正方形,矩形和矩形)进行自组装,其中n是形状的尺寸需要自行组装。

著录项

  • 作者

    Kundeti, Vamsi Krishna.;

  • 作者单位

    University of Connecticut.;

  • 授予单位 University of Connecticut.;
  • 学科 Applied Mathematics.;Computer Science.;Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 97 p.
  • 总页数 97
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号