...
首页> 外文期刊>Proceedings of the National Academy of Sciences of the United States of America >Reference-guided assembly of four diverse Arabidopsis thaliana genomes
【24h】

Reference-guided assembly of four diverse Arabidopsis thaliana genomes

机译:四个不同拟南芥基因组的参考引导组装

获取原文
获取原文并翻译 | 示例
           

摘要

We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through.
机译:我们介绍了四种不同拟南芥菌株的全基因组装配体,它们与十年前发布的125-Mb参考基因组序列互补。使用新开发的参考引导方法,我们从9s至42 Gb的Illumina短读数据中汇编了大型重叠群,这些数据来自已测序的Landsberg erecta(Ler-1),C24,Bur-0和Kro-0菌株。作为该物种的1,001基因组计划的一部分。使用针对参考序列的比对,我们首先降低了从头组装的复杂性,随后整合了与参考序列没有相似性的读数。例如,非中心体C24基因组的一半被长度超过260 kb,最大2.2 Mb的支架覆盖。此外,参考引导的装配覆盖了超过96%的参考基因组,而完整的从头装配只有87%。与2 Mb的双脱氧序列进行比较后发现,参考导向组件的每碱基错误率低于10,000分之一。我们的程序集提供了拟南芥个体之间大规模差异的详细全基因组图片,其中仅靠比对共识方法很难获得其中的大多数差异。我们证明了它们在研究多态性基因表达差异方面的实际意义,并展示了如果仅与参考基因组进行比对,对sRNA测序数据的分析如何会导致错误的结论。基因组装配,原始读数和更多信息可通过以下方式访问。

著录项

  • 来源
  • 作者单位

    Department of Molecular Biology, Max Planck Institute for Developmental Biology, D-72076 Tubingen, Germany,Department of Plant Developmental Biology, Max Planck Institute for Plant Breeding Research, D-50829 Cologne, Germany;

    Department of Molecular Biology, Max Planck Institute for Developmental Biology, D-72076 Tubingen, Germany,Genomic and Epigenomic Variation in Disease Group, Genes and Disease Program, Center for Genomic Regulation (CRG) and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain;

    Department of Molecular Biology, Max Planck Institute for Developmental Biology, D-72076 Tubingen, Germany;

    Center for Bioinformatics Tubingen, Eberhard Karls University, D-72076 Tubingen, Germany;

    Department of Molecular Biology, Max Planck Institute for Developmental Biology, D-72076 Tubingen, Germany;

    Department of Molecular Biology, Max Planck Institute for Developmental Biology, D-72076 Tubingen, Germany;

    Department of Molecular Biology, Max Planck Institute for Developmental Biology, D-72076 Tubingen, Germany;

    Department of Molecular Biology, Max Planck Institute for Developmental Biology, D-72076 Tubingen, Germany;

    Department of Molecular Biology, Max Planck Institute for Developmental Biology, D-72076 Tubingen, Germany;

    Department of Molecular Biology, Max Planck Institute for Developmental Biology, D-72076 Tubingen, Germany;

    Department of Molecular Biology, Max Planck Institute for Developmental Biology, D-72076 Tubingen, Germany;

    Center for Bioinformatics Tubingen, Eberhard Karls University, D-72076 Tubingen, Germany;

    Department of Molecular Biology, Max Planck Institute for Developmental Biology, D-72076 Tubingen, Germany;

  • 收录信息 美国《科学引文索引》(SCI);美国《生物学医学文摘》(MEDLINE);美国《化学文摘》(CA);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号