首页> 外文会议>International Conference on Bioinformatics and Computational Biology >Repeat Complexity of Genomes As a Means to Predict The Performance of Short-read Aligners
【24h】

Repeat Complexity of Genomes As a Means to Predict The Performance of Short-read Aligners

机译:重复基因组的复杂性作为预测短读对准器性能的手段

获取原文

摘要

We investigated the extent to which the complexity of genomic sequences affects the performance of short-read aligners. We demonstrated that a proper measure of sequence complexity was essential in studying the relationship between alignment performance and the abundance of repeats in genomes. In particular, we demonstrated that popular measures of sequence complexity were not suitable and that the right measure of repeat complexity correlated strongly to the performance of many popular short-read aligners. Using genomic sequences from a diverse number of species, we observed that as repeat complexity increased, the performance of these aligners decreased proportionally. This strong negative correlation was observed in all three important aspects of alignment performance: (i) precision, (ii) accuracy and (iii) chromosomal coverage by mapped reads. We took advantage of such strong correlation to construct linear regression models that could predict accurately alignment performance based on repeat complexity without having to align millions of reads to genomes. This finding suggests a novel approach to selecting aligners for new genomes and has great potential for reducing experimental cost.
机译:我们研究了基因组序列复杂性影响短读对准器的性能的程度。我们证明,在研究对准性能与基因组中的重复之间的关系方面是必要的序列复杂度。特别是,我们证明了序列复杂性的流行措施不合适,并且重复复杂度的正确度量强烈地与许多流行的短读对准器的性能相关。使用来自各种物种的基因组序列,我们观察到,随着重复复杂性的增加,这些对准器的性能比例地降低。在对准性能的所有三个重要方面都观察到这种强烈的负相关性:(i)精度,(ii)精度和(iii)通过映射读数的染色体覆盖。我们利用如此强的相关性来构建可以基于重复复杂度预测准确对准性能的线性回归模型,而不必将数百万读取对基因组对准。该发现表明一种新的方法来选择新基因组的对准器,并且具有降低实验成本的巨大潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号