首页> 外文期刊>Molecular ecology >Assembly free comparative genomics of short-read sequence data discovers the needles in the haystack
【24h】

Assembly free comparative genomics of short-read sequence data discovers the needles in the haystack

机译:短读序列数据的无汇编比较基因组学发现了大海捞针

获取原文
获取原文并翻译 | 示例
           

摘要

Most comparative genomic analyses of short-read sequence (SRS) data rely upon the prior assembly of a reference sequence. Here, we present an assembly free analysis of SRS data that discovers sequence variants among focal genomes by tabulating the presence and frequency of ‘complex’ fragments in the data. Using data from nine tree species, we compare genomic diversity from populations to families. As a control, we simulated SRS data for three known plant genomes. The results provide insight into the quality and distributional bias of the sequencing reaction. Three main types of informative complexmers were identified, each possessing unique statistical properties. Type I complexmers are unique to a genome but suffer from a high false positive rate, being highly dependent on read coverage and distribution. Type II complexmers are shared between two genomes and can highlight potential copy-number differences. Type III complexmers are exclusive to a subset of genomes and can be useful for associating genetic differences with phenotypic or geographic variation. At the population level in an endangered timber species, numerous markers were identified that could potentially determine geographic origin of individuals and regulate international trade. We observed that the genomic data for the four fig species were more divergent than for stone oak species, possibly due to their complex pollination syndrome and high rates of gene flow. Our approach greatly enhances the application of SRS technology to the study of non-model organisms and directly identifies the most informative genetic elements for more detailed study and assembly.
机译:短读序列(SRS)数据的大多数比较基因组分析都依赖于参考序列的事先组装。在这里,我们提出了对SRS数据的无装配分析,该方法通过对数据中“复杂”片段的存在和频率进行列表来发现聚焦基因组之间的序列变异。使用来自九个树种的数据,我们比较了从种群到科的基因组多样性。作为对照,我们模拟了三个已知植物基因组的SRS数据。结果提供了对测序反应的质量和分布偏差的见解。鉴定了三种主要类型的信息性络合物,每种都具有独特的统计特性。 I型复合物是基因组所特有的,但假阳性率高,高度依赖于阅读覆盖率和分布。 II型复合物在两个基因组之间共享,并且可以突出潜在的拷贝数差异。 III型复合体仅对一部分基因组有效,可用于将遗传差异与表型或地理变异相关联。在濒临灭绝的木材物种的种群水平上,发现了许多标志物,这些标志物可能确定个人的地理起源并规范国际贸易。我们观察到,四种无花果树种的基因组数据比石栎树种的基因组数据差异更大,这可能是由于它们的复杂授粉综合症和高基因流率。我们的方法极大地增强了SRS技术在非模式生物研究中的应用,并直接识别最有用的遗传元素,以进行更详细的研究和组装。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号