...
首页> 外文期刊>Journal of dairy science >Calling known variants and identifying new variants while rapidly aligning sequence data
【24h】

Calling known variants and identifying new variants while rapidly aligning sequence data

机译:在快速对准序列数据时调用已知的变体并识别新变体

获取原文
获取原文并翻译 | 示例
           

摘要

Whole-genome sequencing studies can identify causativemutations for subsequent use in genomic evaluations.Speed and accuracy of sequence alignment canbe improved by accounting for known variant locationsduring alignment instead of calling the variants afteralignment as in previous programs. The new programsFindmap and Findvar were compared with alignmentusing Burrows–Wheeler alignment (BWA) or SNAP andvariant identification using Genome Analysis ToolKit(GATK) or SAMtools. Findmap stores the referencemap and any known variant locations while aligningreads and counting reference and alternate alleles foreach DNA source. Findmap also outputs potential newsingle nucleotide variant, insertion, and deletion alleles.Findvar separates likely true variants from read errorsand outputs genotype probabilities. Strategies weretested using cattle, human, and a completely randomreference map and simulated or actual data. Most testssimulated 10 bulls, each with 10× simulated sequencereads containing 39 million variants from the 1000 BullGenomes Project. With 10 processors, clock times forprocessing 100× data were 105 h for BWA, 25 h forGATK, and 11 h for SAMtools but only about 4 h forSNAP, 3 h for Findmap, and 1 h for Findvar. Alignmentprograms required about the same total memory;BWA used 46 GB (4.6 GB/processor), whereas >10processors can share the same memory in SNAP andFindmap, which used 40 and 46 GB, respectively. Findmapcorrectly mapped 92.9% of reads (compared with92.6% from SNAP and 90.5% from BWA) and had highaccuracy of calling alleles for known variants. For newvariants, Findvar found 99.8% of single nucleotide variants,79% of insertions, and 67% of deletions; GATKfound 99.4, 95, and 90%, respectively; and SAMtoolsfound 99.8, 12, and 16%, respectively. False positives(as percentages of true variants) were 11% of singlenucleotide variants, 0.4% of insertions, and 0.3% ofdeletions from Findvar; 12, 8.4, and 2.9%, respectively,from GATK; and 37, 1.3, and 0.4%, respectively, fromSAMtools. Advantages of Findmap and Findvar arefast processing, precise alignment, more useful datasummaries, more compact output, and fewer steps.Calling known variants during alignment allows moreefficient and accurate sequence-based genotyping.
机译:全基因组测序研究可以识别致病性后续使用基因组评估的突变。序列对齐的速度和准确性可以通过考虑已知的变体位置来改善在对齐期间,而不是之后调用变体在以前的程序中对齐。新节目与对齐相比,findmap和findvar使用挖掘机轮式对齐(BWA)或捕捉和使用基因组分析工具包的变体识别(GATK)或SAMTOOLS。 findmap存储参考在对齐时映射和任何已知的变体位置读取和计数参考和备用等位基因每个DNA源。 FINDMAP还输出潜在的新功能单核苷酸变体,插入和缺失等位基因。FindVar可能从读取错误中分隔真正的变体并输出基因型概率。策略是使用牛,人和完全随机测试参考图和模拟或实际数据。大多数测试模拟10公牛,每个公牛都有10×模拟序列读取1000公牛的3900万变种基因组项目。有10个处理器,时钟时间加工100×BWA的数据为105小时,25小时GATK,11小时对于SAMTOOLS,但只有约4小时Snap,3 H for findmap,以及for findvar的1 h。结盟需要大致相同的总内存;BWA使用了46 GB(4.6 GB /处理器),而10处理器可以在Snap和Snap中共享相同的内存FindMap分别使用40和46 GB。 findmap.正确映射了92.9%的读数(与...相比)从BWA的Snap和90.5%的92.6%)并高呼叫已知变体等位基因的准确性。对于新的变体,FindVar发现了99.8%的单核苷酸变体,79%的插入,67%的缺失;加泰克发现99.4,95和90%;和samtools.发现99.8,12和16%。误报(作为真正变体的百分比)是单身的11%核苷酸变体,0.4%的插入,0.3%FindVar删除; 12,8.4和2.9%,来自Gatk;和37,1.3和0.4%,分别来自samtools。 findmap和findvar的优势是快速处理,精确对准,更有用的数据摘要,更紧凑的输出和更少的步骤。在对齐期间呼叫已知变体允许更多基于高效和准确的序列基因分型。

著录项

  • 来源
    《Journal of dairy science》 |2019年第4期|3216-3229|共14页
  • 作者单位

    USDA Agricultural Research Service Animal Genomics and Improvement Laboratory Beltsville MD 20705-2350;

    USDA Agricultural Research Service Animal Genomics and Improvement Laboratory Beltsville MD 20705-2350;

    University of Maryland School of Medicine Baltimore 21201;

  • 收录信息 美国《科学引文索引》(SCI);美国《生物学医学文摘》(MEDLINE);美国《化学文摘》(CA);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    sequence alignment; variant calling; indel;

    机译:序列对准;变体调用;indel.;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号