...
首页> 外文期刊>BMC Bioinformatics >acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data
【24h】

acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data

机译:acdc –单细胞基因组数据的自动污染检测和置信度估计

获取原文
           

摘要

Background A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. Results We present acdc, a tool specifically developed to aid the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Conclusions Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.
机译:背景技术单细胞测序的主要障碍是样品被外源DNA污染。为了确保干净的基因组装配并防止污染引入公共数据库,需要在测序后分析中投入大量的质量控制工作。污染筛选通常依赖于基于参考的方法,例如数据库比对或标记基因搜索,这将可检测污染物的集合限制为与参考物种密切相关的生物。由于生命之树中的基因组覆盖范围非常分散,因此迫切需要一种无参考方法来对序列数据中的污染物进行鉴定。结果我们展示了acdc,这是专门开发用于辅助基因组序列数据质量控制过程的工具。通过结合监督和非监督方法,它可以可靠地检测已知和从头污染。首先,16S rRNA基因预测和超快速精确比对技术的引入允许使用数据库中的现有知识进行序列分类。其次,通过使用最新的机器学习技术来实现无参考检查,这些技术包括快速,非线性地降低寡核苷酸签名的维数,以及随后的自动估计簇数的簇算法。后者还能够去除任何污染物,从而产生干净的样品。此外,考虑到数据的复杂性和群集的不适性,acdc使用自举技术来提供统计意义深远的置信度值。经过对来自不同测序项目的大量样品进行测试,我们的软件能够快速,准确地识别污染。结果显示在交互式用户界面中。 Acdc可以从Web以及专用的命令行应用程序运行,从而可以轻松集成到大型测序项目分析工作流程中。结论Acdc可以可靠地检测单细胞基因组数据中的污染。除了数据库驱动的检测之外,它还通过其无监督技术对现有工具进行了补充,该技术可用于从头检测污染物。我们的贡献有可能极大地减少投入这些流程的资源量,特别是在参考物种供应有限的情况下。随着单细胞基因组数据的持续快速增长,ACDC为重要的质量保证工具添加了工具包。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号