首页> 外文学位 >Some statistical issues in genomics: Shotgun DNA sequence assembly and cDNA expression data.
【24h】

Some statistical issues in genomics: Shotgun DNA sequence assembly and cDNA expression data.

机译:基因组学中的一些统计问题:Shotgun DNA序列装配和cDNA表达数据。

获取原文
获取原文并翻译 | 示例

摘要

Recently many genome sequencing projects have used double-end clones. The first part of my thesis is to predict genome coverage in these projects. The traditional Lander-Waterman formulas can only address the statistical properties for the assembly projects using clones without "mate-pairs". Therefore, there is a need to extend the Lander-Waterman formulas to cover double-end genome sequencing.;We improve previous results and calculate the average number and length of scaffolds, islands, gaps, etc. In addition, we estimate the distribution of the gap size between adjacent islands. Instead of fixed-length clones and fixed-length ends, here we allow general statistical distributions of these quantities.;Another topic in the first part is to estimate gap sizes after assembly has been done. This is needed especially when there is no high-resolution map. We have developed some methods and compared them with current methods on real data. This work will provide some guidance for estimating gap sizes after assembly.;In the second part, we predict the repeat structure of a genome by using reads without assembly. The repeats in the human genome cause every assembler to make mistakes. To give some clues before assembly, we estimate the repeat structure of a genome from the l-tuple information contained in reads. Our results agree with both simulations and experiments very well. In addition, it provides a consensus estimate for some repeat families in the genome that will help the assembly process. It can be used to provide a better repeat masker as well. Furthermore, it gives an estimation of the genome size.;The third part is focused on statistical analysis of cDNA microarray data. It is important to statistically determine significantly up or down regulated genes. We have modified Kerr and Churchill's method to approach the problem. Our method enables scientists to be more confident about the conclusions drawn from data. It has been incorporated into commercial software and patented.
机译:最近,许多基因组测序项目都使用了双端克隆。本文的第一部分是预测这些项目中的基因组覆盖率。传统的Lander-Waterman公式只能使用不带“配对”的克隆解决装配项目的统计属性。因此,有必要扩展Lander-Waterman公式以涵盖双端基因组测序。;我们改善了先前的结果,并计算了支架,岛,缺口等的平均数目和长度。此外,我们估计了相邻岛之间的间隙大小。代替固定长度的克隆和固定长度的末端,这里我们允许这些数量的一般统计分布。第一部分中的另一个主题是在组装完成后估计间隙大小。尤其是在没有高分辨率地图时,这是必需的。我们已经开发了一些方法,并将它们与当前方法进行了实数据比较。这项工作将为估计组装后的缺口大小提供一些指导。在第二部分中,我们通过使用未组装的读数来预测基因组的重复结构。人类基因组中的重复序列会导致每个组装者犯错。为了在组装前提供一些线索,我们从阅读物中包含的l-元组信息估计基因组的重复结构。我们的结果与仿真和实验都非常吻合。此外,它为基因组中的某些重复家族提供了一个共有的估计,这将有助于组装过程。它也可用于提供更好的重复遮罩。此外,它还提供了基因组大小的估计。第三部分着重于cDNA微阵列数据的统计分析。从统计学上确定显着上调或下调的基因很重要。我们修改了Kerr和Churchill的方法来解决该问题。我们的方法使科学家对从数据得出的结论更有信心。它已被并入商业软件并获得专利。

著录项

  • 作者

    Li, Xiaoman.;

  • 作者单位

    University of Southern California.;

  • 授予单位 University of Southern California.;
  • 学科 Biology Biostatistics.;Biology Genetics.
  • 学位 Ph.D.
  • 年度 2002
  • 页码 85 p.
  • 总页数 85
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号