...
首页> 外文期刊>BMC Genomics >Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data
【24h】

Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data

机译:从1000个基因组全外显子组测序数据中提取和注释人线粒体基因组

获取原文
   

获取外文期刊封面封底 >>

       

摘要

BackgroundWhole Exome Sequencing (WES) is one of the most used and cost-effective next generation technologies that allows sequencing of all nuclear exons. Off-target regions may be captured if they present high sequence similarity with baits. Bioinformatics tools have been optimized to retrieve a large amount of WES off-target mitochondrial DNA (mtDNA), by exploiting the aspecificity of probes, partially overlapping to Nuclear mitochondrial Sequences (NumtS). The 1000 Genomes project represents one of the widest resources to extract mtDNA sequences from WES data, considering the large effort the scientific community is undertaking to reconstruct human population history using mtDNA as marker, and the involvement of mtDNA in pathology.ResultsA previously published pipeline aimed at assembling mitochondrial genomes from off-target WES reads and further improved to detect insertions and deletions (indels) and heteroplasmy in a dataset of 1242 samples from the 1000 Genomes project, enabled to obtain a nearly complete mitochondrial genome from 943 samples (76% analyzed exomes). The robustness of our computational strategy was highlighted by the reduction of reads amount recognized as mitochondrial in the original annotation produced by the Consortium, due to NumtS filtering.An accurate survey was carried out on 1242 individuals. 215 indels, mostly heteroplasmic, and 3407 single base variants were mapped. A homogeneous mismatches distribution was observed along the whole mitochondrial genome, while a lower frequency of indels was found within protein-coding regions, where frameshift mutations may be deleterious. The majority of indels and mismatches found were not previously annotated in mitochondrial databases since conventional sequencing methods were limited to homoplasmy or quasi-homoplasmy detection. Intriguingly, upon filtering out non haplogroup-defining variants, we detected a widespread population occurrence of rare events predicted to be damaging. Eventually, samples were stratified into blood- and lymphoblastoid-derived to detect possibly different trends of mutability in the two datasets, an analysis which did not yield significant discordances.ConclusionsTo the best of our knowledge, this is likely the most extended population-scale mitochondrial genotyping in humans enriched with the estimation of heteroplasmies.
机译:背景全外显子组测序(WES)是最常用且最具成本效益的下一代技术之一,可对所有核外显子进行测序。如果脱靶区域与诱饵具有高度序列相似性,则可以捕获它们。通过利用与核线粒体序列(NumtS)部分重叠的探针的特异性,已经优化了生物信息学工具以检索大量WES脱靶线粒体DNA(mtDNA)。考虑到科学界正在以mtDNA为标记物重建人类历史的巨大努力,并且mtDNA参与了病理学研究,因此1000基因组计划代表了从WES数据中提取mtDNA序列的最广泛资源之一。从离靶WES读取中组装线粒体基因组,并进一步改进以检测1000个基因组计划的1242个样品的数据集中的插入和缺失(indels)和异质性,从而能够从943个样品中获得几乎完整的线粒体基因组(分析的76%外显子组)。由于NumtS过滤,联盟计算出的原始注释中识别为线粒体的阅读量减少,从而突出了我们的计算策略的鲁棒性。对1242个人进行了准确的调查。绘制了215个indel(主要是异质性)和3407个单碱基变异体。沿整个线粒体基因组观察到均一的错配分布,而在蛋白质编码区域内发现插入缺失的频率较低,其中移码突变可能是有害的。由于常规测序方法仅限于同质或准同质检测,因此以前发现的大多数插入缺失和错配均未在线粒体数据库中进行注释。有趣的是,在滤除非单倍群定义的变体后,我们发现了预计会造成破坏的稀有事件的普遍发生。最终,将样本分为血源和淋巴母细胞来源,以检测两个数据集中可能存在的变异性趋势,这一分析并未产生明显的矛盾。结论据我们所知,这可能是人群规模最广泛的线粒体人类的基因分型丰富了对异质性的估计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号