首页> 外文期刊>BMC Medical Genomics >Konnector v2.0: pseudo-long reads from paired-end sequencing data
【24h】

Konnector v2.0: pseudo-long reads from paired-end sequencing data

机译:Konnector v2.0:从配对末端测序数据中进行伪长读取

获取原文
           

摘要

Background Reading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When the fragment length is longer than the combined read length, there remains a gap of unsequenced nucleotides between read pairs. If the target in such experiments is sequenced at a level to provide redundant coverage, it may be possible to bridge these gaps using bioinformatics methods. Konnector is a local de novo assembly tool that addresses this problem. Here we report on version 2.0 of our tool. Results Konnector uses a probabilistic and memory-efficient data structure called Bloom filter to represent a k-mer spectrum - all possible sequences of length k in an input file, such as the collection of reads in a PET sequencing experiment. It performs look-ups to this data structure to construct an implicit de Bruijn graph, which describes (k-1) base pair overlaps between adjacent k-mers. It traverses this graph to bridge the gap between a given pair of flanking sequences. Conclusions Here we report the performance of Konnector v2.0 on simulated and experimental datasets, and compare it against other tools with similar functionality. We note that, representing k-mers with 1.5 bytes of memory on average, Konnector can scale to very large genomes. With our parallel implementation, it can also process over a billion bases on commodity hardware.
机译:背景技术从DNA片段的两端读取核苷酸称为成对末端标签(PET)测序。当片段长度长于组合的读取长度时,在读取对之间仍存在未测序核苷酸的缺口。如果将此类实验中的靶标测序以提供冗余覆盖的水平,则有可能使用生物信息学方法弥合这些缺口。 Konnector是解决此问题的本地从头组装工具。在这里,我们报告工具的2.0版。结果Konnector使用称为布隆过滤器的概率和内存有效的数据结构来表示k-mer光谱-输入文件中长度k的所有可能序列,例如PET测序实验中的读数集合。它对此数据结构进行查找,以构造一个隐式的de Bruijn图,该图描述了相邻k个聚体之间的(k-1)个碱基对重叠。它遍历此图以桥接给定的一对侧翼序列之间的间隔。结论在这里,我们报告了Konnector v2.0在模拟和实验数据集上的性能,并将其与其他具有类似功能的工具进行了比较。我们注意到,用平均1.5个字节的内存表示k-mers,Konnector可以扩展到非常大的基因组。通过我们的并行实现,它还可以基于商品硬件处理超过十亿个对象。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号