首页> 外文期刊>IEEE Transactions on Information Theory >Information Theory of DNA Shotgun Sequencing
【24h】

Information Theory of DNA Shotgun Sequencing

机译:DNA Shotgun测序的信息论

获取原文
获取原文并翻译 | 示例
           

摘要

DNA sequencing is the basic workhorse of modern day biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original sequence. A basic question is: given a sequencing technology and the statistics of the DNA sequence, what is the minimum number of reads required for reliable reconstruction? This number provides a fundamental limit to the performance of any assembly algorithm. For a simple statistical model of the DNA sequence and the read process, we show that the answer admits a critical phenomenon in the asymptotic limit of long DNA sequences: if the read length is below a threshold, reconstruction is impossible no matter how many reads are observed, and if the read length is above the threshold, having enough reads to cover the DNA sequence is sufficient to reconstruct. The threshold is computed in terms of the Renyi entropy rate of the DNA sequence. We also study the impact of noise in the read process on the performance.
机译:DNA测序是现代生物学和医学的基础。 gun弹枪测序是使用的主要技术:从DNA序列中提取许多随机定位的短片段,称为读取,然后将这些读取组装起来以重建原始序列。一个基本问题是:给定测序技术和DNA序列的统计信息,可靠重建所需的最少读取次数是多少?此数字为任何汇编算法的性能提供了基本限制。对于DNA序列和读取过程的简单统计模型,我们表明答案承认在长DNA序列的渐近极限中存在严重现象:如果读取长度低于阈值,则无论有多少读取都无法重建观察到,并且如果读取长度超过阈值,则具有足以覆盖DNA序列的读取足以重建。该阈值是根据DNA序列的Renyi熵率计算的。我们还研究了读取过程中噪声对性能的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号