首页> 外文学位 >Finding a novel way for fast sequence alignment and exploiting information theory in bacterial genomes and complete phages.
【24h】

Finding a novel way for fast sequence alignment and exploiting information theory in bacterial genomes and complete phages.

机译:寻找一种快速的序列比对的新方法,并利用细菌基因组和完整噬菌体中的信息论。

获取原文
获取原文并翻译 | 示例

摘要

The invention of next generation sequencing technology (NGS) provides the capability of generating high throughput low cost sequencing data, and is used by scientists to address a diverse range of biological problems. Several data analysis algorithms have been developed in last few years to best exploit NGS data. New tools and methods have also been implemented for better understanding of these data.;This dissertation presents several novel techniques involving NGS datasets. The first technique, qudaich is a novel sequence aligner, which can be used as a key part of NGS data analysis. Qudaich generates the pairwise local alignments of a query dataset against a database. Qudaich can efficiently process large volumes of data and is well suited to the next generation reads datasets. This aligner can also handle both DNA and protein sequences and tries to generate the best possible alignment for each query sequence. In contrast to other contemporary aligners, qudaich is more efficient in terms of execution time and accuracy.;Next, in this dissertation, I show different ways to extract useful genomic information from NGS data, which, in turn, shows promising directions to solve some of the existing biological problems like prophage prediction. Prophages are viruses that integrated into, and replicated as part of, the bacterial genome. These genetic elements can have tremendous impact on their hosts. The majority of other phage finding tools mainly rely on homology-based approach for prophage prediction, which limits the de novo discovery of novel prophages. This dissertation presents a novel algorithm, PhiSpy to predict prophages in bacterial genomes. PhiSpy combines similarity based and composition based strategies to identify prophages. It finds 94% of the known prophages in 50 complete bacterial genomes with a 6% false negative rate and a 0.66% false positive rate. This led to a successful prediction of the largest set of prophages comparing to other prophage finding applications.;Finally, this dissertation also demonstrates that information theory can be effectively applied to find informative sequences, to predict the lifestyle restrictions of an organism, and to analyze the deviation of the amino acid utilization profile in different metabolic processes in different organisms.;Together, these tools will enable the next generation of sequence analyses using next generation sequence data.
机译:下一代测序技术(NGS)的发明提供了生成高通量低成本测序数据的能力,并被科学家用来解决各种生物学问题。最近几年开发了几种数据分析算法,以最好地利用NGS数据。为了更好地理解这些数据,还采用了新的工具和方法。本文提出了几种涉及NGS数据集的新颖技术。第一项技术qudaich是一种新颖的序列比对器,可以用作NGS数据分析的关键部分。 Qudaich生成查询数据集与数据库的成对局部对齐。 Qudaich可以有效地处理大量数据,非常适合下一代读取数据集。该比对器还可以处理DNA和蛋白质序列,并尝试为每个查询序列生成最佳的比对结果。与其他当代aligner相比,qudaich在执行时间和准确性上更为有效。接下来,在本文中,我展示了从NGS数据中提取有用的基因组信息的不同方法,从而为解决某些问题提供了有希望的方向现有的生物学问题,如预言预测。噬菌体是整合到细菌基因组中并作为其一部分复制的病毒。这些遗传因素可能对其宿主产生巨大影响。大多数其他噬菌体发现工具主要依靠基于同源性的方法进行预噬预测,这限制了从头开始发现新的噬菌体。本文提出了一种新的算法PhiSpy,用于预测细菌基因组中的噬菌体。 PhiSpy结合了基于相似度和基于构图的策略来识别预言。它在50个完整细菌基因组中发现了94%的已知噬菌体,假阴性率为6%,假阳性率为0.66%。最终,成功预测了与其他发现噬菌体的应用相比最大的噬菌体。最后,本论文还证明了信息理论可以有效地应用于发现信息序列,预测生物的生活方式限制并进行分析在不同生物体中,不同代谢过程中氨基酸利用曲线的偏差。这些工具将共同支持使用下一代序列数据进行下一代序列分析。

著录项

  • 作者

    Akhter, Sajia.;

  • 作者单位

    The Claremont Graduate University.;

  • 授予单位 The Claremont Graduate University.;
  • 学科 Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 159 p.
  • 总页数 159
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号