首页> 美国卫生研究院文献>Database: The Journal of Biological Databases and Curation >A novel multi-alignment pipeline for high-throughput sequencing data
【2h】

A novel multi-alignment pipeline for high-throughput sequencing data

机译:用于高通量测序数据的新型多比对管线

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending on the genetic distances of the target sequences from the reference. To avoid this bias, researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single-reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins.>Database URL:
机译:在分析高通量测序数据中的等位基因效应时,将读取序列映射到参考序列是一个常见步骤。参考的选择至关重要,因为它对定量序列分析的影响不可忽略。最近的研究表明,按照惯例,与单个标准参考序列比对可能会导致潜在的偏差,具体取决于靶序列与参考序列之间的遗传距离。为了避免这种偏差,研究人员已采取了使用修饰的参考序列的方法。即使有了这种改进,仍然存在各种局限性和问题仍未解决,其中包括降低的映射率,读取映射的偏移以及选择要包括的各种变体以消除偏差。为了解决这些问题,我们提出了一种新颖而通用的多路线流水线。我们的产品线将已知或可疑创始人的基因组变异整合到单独的参考序列中,并对每个序列进行比对。通过将读段映射到多个参考序列并在以后合并它们,我们能够挽救更多的读段并减少使用单个公共参考引起的偏倚。此外,在合并过程中确定并注释了每个读数的基因组来源,与在已知变体位置的简单等位基因查询相比,提供了更好的信息来源来评估差异表达。通过使用Dialelel杂交的RNA-seq,我们将管道与单参照管道进行了比较,并展示了具有更一致的读段和具有指定来源的更高百分比读段的优势。>数据库URL:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号