首页> 外文期刊>Standards in Genomic Sciences >Large-scale contamination of microbial isolate genomes by Illumina PhiX control
【24h】

Large-scale contamination of microbial isolate genomes by Illumina PhiX control

机译:Illumina PhiX控件对微生物分离基因组的大规模污染

获取原文
       

摘要

With the rapid growth and development of sequencing technologies, genomes have become the new go-to for exploring solutions to some of the world’s biggest challenges such as searching for alternative energy sources and exploration of genomic dark matter. However, progress in sequencing has been accompanied by its share of errors that can occur during template or library preparation, sequencing, imaging or data analysis. In this study we screened over 18,000 publicly available microbial isolate genome sequences in the Integrated Microbial Genomes database and identified more than 1000 genomes that are contaminated with PhiX, a control frequently used during Illumina sequencing runs. Approximately 10% of these genomes have been published in literature and 129 contaminated genomes were sequenced under the Human Microbiome Project. Raw sequence reads are prone to contamination from various sources and are usually eliminated during downstream quality control steps. Detection of PhiX contaminated genomes indicates a lapse in either the application or effectiveness of proper quality control measures. The presence of PhiX contamination in several publicly available isolate genomes can result in additional errors when such data are used in comparative genomics analyses. Such contamination of public databases have far-reaching consequences in the form of erroneous data interpretation and analyses, and necessitates better measures to proofread raw sequences before releasing them to the broader scientific community.
机译:随着测序技术的快速发展和发展,基因组已成为探索解决全球一些最大挑战(例如寻找替代能源和探索基因组暗物质)的解决方案的新方法。然而,测序的进展伴随着其在模板或文库制备,测序,成像或数据分析过程中可能发生的错误份额。在这项研究中,我们在“整合微生物基因组”数据库中筛选了超过18,000个公众可获得的微生物分离基因组序列,并鉴定了1000多个被PhiX污染的基因组,PhiX是Illumina测序运行中经常使用的对照。这些基因组中约有10%已在文献中发表,并且在人类微生物组计划下对129个受污染的基因组进行了测序。原始序列读数易于受到各种来源的污染,通常会在下游质量控制步骤中消除。检测到被PhiX污染的基因组表明适当的质量控制措施的应用或有效性出现了下降。当将这些数据用于比较基因组分析时,几个可公开获得的分离基因组中存在PhiX污染会导致其他错误。这种对公共数据库的污染以错误的数据解释和分析的形式产生了深远的影响,在将原始序列发布给更广泛的科学界之前,需要采取更好的措施来校对原始序列。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号