首页> 外文期刊>Genomics >Identifying suitable tools for variant detection and differential gene expression using RNA-seq data
【24h】

Identifying suitable tools for variant detection and differential gene expression using RNA-seq data

机译:使用RNA-SEQ数据识别用于变体检测和差异基因表达的合适工具

获取原文
           

摘要

Neurodegenerative diseases are the most predominate brain disorders around the globe and the affected populations are rapidly increasing. Recently, these diseases have been addressed using the data obtained from RNA-sequencing technology to reveal the changes in gene/transcript expression, effect of variants, and pathways involved in disease mechanisms. However, the observations mainly depend on the aligners/tools and the performance of existing RNA-seq tools on hg38 genome assembly has not yet been documented. In this study, we performed a systematic analysis of various spliced aligners, transcript assembling and variant calling tools based on both genomic assemblies (hg19/hg38) from hippocampus brain tissue. This helps to identify the best possible combination tools for hg38 annotation. In order to evaluate the identified variants from various pipelines, we compared them with expression Quantitative Trait Loci (eQTL) and Genome-Wide Association Study (GWAS). In addition, the identified differentially expressed genes (DG) were compared with microarray studies. From our analysis of variant calling, the combination of GATK (Genome Analysis Tool-kit) and STAR (Spliced Transcripts Alignment to a Reference) protocol yields a larger number of GWAS/eQTL variants compared to SAMtools (Sequence Alignment Map). We also identified a higher number of non-coding variants in hg38 compared to hg19 due to enhanced annotation. In the case of various DG pipelines, we found that the Salmon-based hg38 transcriptomic quantification yields a higher number of reported DG compared to other genome-based quantification methods. This study revealed that higher number of reads maps to multiple location of the genome with hg38 compared to hg19, and these spurious multi-mapped reads may affect the gene quantification techniques. We suggest that it is necessary to develop efficient algorithms, which can handle the multi-mapped reads and improve the performance of genome-based alignment quantification.
机译:神经退行性疾病是全球最占优势的脑疾病,受影响的人口正在迅速增加。最近,已经使用从RNA测序技术获得的数据来解决这些疾病,以揭示基因/转录物表达,变体效果和疾病机制的途径的变化。然而,观察结果主要取决于对准器/工具以及在HG38基因组组件上的现有RNA-SEQ工具的性能尚未记录。在这项研究中,我们对来自海马组织(Hg19 / Hg38)的各种剪接对准器,转录组装和变体呼叫工具进行了系统分析。这有助于确定HG38注释的最佳组合工具。为了评估来自各种管道的鉴定的变体,我们将它们与表达定量性状基因座(EQTL)和基因组 - 宽协会研究(GWAs)进行比较。此外,将鉴定的差异表达基因(DG)与微阵列研究进行了比较。从我们对变体呼叫的分析,GATK(基因组分析工具套件)和明星(拼接转录物对准到参考)协议的组合产生了更大数量的GWAS / EQTL变体,与SAMTOOLS(序列对准图)相比。由于增强的注释,我们还鉴定了HG38中的更多数量的HG38中的非编码变体。在各种DG管道的情况下,我们发现与其他基于基于基于基于基因组的量化方法相比,基于鲑鱼的HG38转录组定量产生较高数量的DG。该研究表明,与HG19相比,将较高数量的读取映射到基因组的多个位置,并且这些寄生多映射读数可能影响基因定量技术。我们建议有必要开发有效的算法,该算法可以处理多映射的读取并提高基于基于基于基于基于基于基于基于对准量化的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号