首页> 外文会议>Big data - BigData 2018 >K-mer Counting for Genomic Big Data
【24h】

K-mer Counting for Genomic Big Data

机译:基因组大数据的K-mer计数

获取原文
获取原文并翻译 | 示例

摘要

Counting the abundance of all the k-mers (substrings of length k) in sequencing reads is an important step of many bioinformatics applications, including de novo assembly, error correction and multiple sequence alignment. However, processing large amount of genomic dataset (TB range) has become a bottle neck in these bioinformatics pipelines. At present, most of the k-mer counting tools are based on single node, and cannot handle the data at TB level efficiently. In this paper, we propose a new distributed method for k-mer counting with high scalability. We test our k-mer counting tool on Mira supercomputer at Argonne National Lab, the experimental results show that it can scale to 8192 cores with an efficiency of 43% when processing 2 TB simulated genome dataset with 200 billion distinct k-mers (graph size), and only 578 s is used for the whole genome statistical analysis.
机译:计算测序读物中所有k-mers(长度为k的子串)的丰度是许多生物信息学应用程序的重要步骤,包括从头组装,错误校正和多序列比对。但是,处理大量的基因组数据集(TB范围)已成为这些生物信息学管道中的瓶颈。目前,大多数k-mer计数工具都是基于单节点的,无法有效处理TB级别的数据。在本文中,我们提出了一种新的分布式k-mer计数方法,具有很高的可扩展性。我们在Argonne国家实验室的Mira超级计算机上测试了我们的k-mer计数工具,实验结果表明,当处理具有2,000亿个不同k-mers的2 TB模拟基因组数据集时,它可以扩展至8192个核心,效率为43%(图大小) ),仅578 s用于全基因组统计分析。

著录项

  • 来源
    《Big data - BigData 2018》|2018年|345-351|共7页
  • 会议地点 Seattle(US)
  • 作者单位

    Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China,University of Science and Technology of China, Hefei 230041, China;

    Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;

    Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;

    Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;

    Argonne National Laboratory, Lemont, IL 60439, USA;

    Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;

    Shenzhen Children's Hospital, Shenzhen 518038, China;

    Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    K-mer counting; Genome sequence analysis; Performance and scalability;

    机译:K-mer计数;基因组序列分析;性能和可扩展性;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号