首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >A crowdsourcing method for correcting sequencing errors for the third-generation sequencing data
【24h】

A crowdsourcing method for correcting sequencing errors for the third-generation sequencing data

机译:一种纠正第三代测序数据测序错误的众包方法

获取原文

摘要

The third generation sequencing data exposes great advantage on read length, which extremely benefits the genomic analyses. However, the third generation sequencing data implies error models different from the ones that the second generation data brings. It is suggested to correct sequencing errors, which could significantly reduce false positives in downstream analyses. Existing error correction approaches often suffer accuracy loss when the hybrid reads present diversity or the coverage varies. In this paper, we propose a novel method based on crowdsourcing strategy, which is implemented as CLTC. CLTC is also a hybrid correction algorithm, which consists of four steps. The second generation reads are first collected and mapped to the third generation reads. Then, the base difficult level is defined to describe the diversities on a base among a group of 2nd-generation reads covered it. The capability is evaluated for each 2nd-generation read, which considers the base difficult levels across the read, the consistency among overlapped reads and the mapping quality between the 2nd- and 3rd-generation reads. A heuristic algorithm is designed for the calculation of capabilities. An expectation-maximization algorithm is finally used to compute the corrected result for each base-pair. We test CLTC on different datasets and compare to the existing approaches. The results demonstrate that CLTC is able to achieve higher accuracy and performs faster than the existing ones.
机译:第三代测序数据在读取长度方面具有很大优势,这极大地有利于基因组分析。但是,第三代测序数据意味着与第二代数据带来的误差模型不同的误差模型。建议纠正测序错误,这可能会大大减少下游分析中的假阳性。当混合读取现有分集或覆盖范围变化时,现有的纠错方法通常会遭受精度损失。在本文中,我们提出了一种基于众包策略的新方法,即CLTC。 CLTC还是一种混合校正算法,包括四个步骤。首先收集第二代读取并将其映射到第三代读取。然后,定义基本难易程度以描述覆盖它的一组第二代读物之间的基础上的多样性。对于每个第二代读取,都要评估该功能,其中要考虑该读取的基本难易程度,重叠读取之间的一致性以及第二代与第三代读取之间的映射质量。设计了一种启发式算法来计算能力。期望最大化算法最终用于计算每个碱基对的校正结果。我们在不同的数据集上测试CLTC,并与现有方法进行比较。结果表明,CLTC能够比现有的方法具有更高的精度和更快的执行速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号