首页> 外文期刊>Database >A novel multi-alignment pipeline for high-throughput sequencing data
【24h】

A novel multi-alignment pipeline for high-throughput sequencing data

机译:用于高通量测序数据的新型多比对管线

获取原文
           

摘要

Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending on the genetic distances of the target sequences from the reference. To avoid this bias, researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single-reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins. Database URL: http://csbio.unc.edu/CCstatus/index.py?run=Pseudo
机译:在分析高通量测序数据中的等位基因效应时,将读取序列映射到参考序列是一个常见步骤。参考的选择至关重要,因为它对定量序列分析的影响不可忽略。最近的研究表明,按照惯例,与单个标准参考序列比对可能会导致潜在的偏差,具体取决于靶序列与参考序列之间的遗传距离。为了避免这种偏见,研究人员已诉诸于使用修饰的参考序列。即使有了这种改进,仍然存在各种局限性和问题仍未解决,其中包括降低的映射率,读取映射中的偏移以及选择要包括的各种变体以消除偏差。为了解决这些问题,我们提出了一种新颖而通用的多路线流水线。我们的产品线将已知或可疑创始人的基因组变异整合到单独的参考序列中,并对每个序列进行比对。通过将读段映射到多个参考序列并在以后合并它们,我们能够挽救更多的读段并减少使用单个公共参考引起的偏倚。此外,在合并过程中确定并注释了每个读数的基因组起源,与在已知变体位置的简单等位基因查询相比,提供了更好的信息源来评估差异表达。使用Dialelel杂交的RNA-seq,我们将管道与单参照管道进行了比较,并展示了具有更高比对的读数和具有指定来源的更高百分比的读数的优势。数据库网址:http://csbio.unc.edu/CCstatus/index.py?run=Pseudo

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号