...
首页> 外文期刊>BMC Genomics >Unsupervised genome-wide recognition of local relationship patterns
【24h】

Unsupervised genome-wide recognition of local relationship patterns

机译:无监督的全基因组局部关系模式识别

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background Phenomena such as incomplete lineage sorting, horizontal gene transfer, gene duplication and subsequent sub- and neo-functionalisation can result in distinct local phylogenetic relationships that are discordant with species phylogeny. In order to assess the possible biological roles for these subdivisions, they must first be identified and characterised, preferably on a large scale and in an automated fashion. Results We developed Saguaro, a combination of a Hidden Markov Model (HMM) and a Self Organising Map (SOM), to characterise local phylogenetic relationships among aligned sequences using cacti, matrices of pair-wise distance measures. While the HMM determines the genomic boundaries from aligned sequences, the SOM hypothesises new cacti in an unsupervised and iterative fashion based on the regions that were modelled least well by existing cacti. After testing the software on simulated data, we demonstrate the utility of Saguaro by testing two different data sets: (i) 181 Dengue virus strains, and (ii) 5 primate genomes. Saguaro identifies regions under lineage-specific constraint for the first set, and genomic segments that we attribute to incomplete lineage sorting in the second dataset. Intriguingly for the primate data, Saguaro also classified an additional ~3% of the genome as most incompatible with the expected species phylogeny. A substantial fraction of these regions was found to overlap genes associated with both the innate and adaptive immune systems. Conclusions Saguaro detects distinct cacti describing local phylogenetic relationships without requiring any a priori hypotheses. We have successfully demonstrated Saguaro’s utility with two contrasting data sets, one containing many members with short sequences (Dengue viral strains: n = 181, genome size = 10,700 nt), and the other with few members but complex genomes (related primate species: n = 5, genome size = 3 Gb), suggesting that the software is applicable to a wide variety of experimental populations. Saguaro is written in C++, runs on the Linux operating system, and can be downloaded from http://saguarogw.sourceforge.net/ webcite .
机译:背景现象,例如不完整的谱系排序,水平的基因转移,基因重复以及随后的亚功能和新功能化现象,可能导致与物种系统发育不一致的独特局部系统发育关系。为了评估这些细分的可能的生物学作用,必须首先对其进行识别和特征化,最好是大规模并以自动化方式进行。结果我们开发了Saguaro(隐藏马尔可夫模型(HMM)和自组织映射(SOM)的组合),以使用仙人掌和成对距离度量矩阵来表征比对序列之间的局部系统发生关系。尽管HMM通过比对序列确定了基因组边界,但SOM假设基于现有仙人掌建模最差的区域,以无监督和迭代的方式假设了新的仙人掌。在模拟数据上测试软件后,我们通过测试两个不同的数据集来证明Saguaro的实用性:(i)181登革热病毒株和(ii)5个灵长类动物基因组。柱仙人掌识别了第一组受谱系约束的区域,以及我们归因于第二个数据集中谱系排序不完整的基因组片段。有趣的是,对于灵长类动物的数据,Saguaro还将另外约3%的基因组归类为与预期物种系统发育最不兼容的物种。发现这些区域的大部分重叠与先天和适应性免疫系统相关的基因。结论Saguaro可以检测到描述局部系统发生关系的独特仙人掌,而无需任何先验假设。我们已经成功地通过两个对比数据集证明了Saguaro的效用,其中一个包含许多具有短序列的成员(登革热病毒株:n = 181,基因组大小= 10,700 nt),另一个包含很少的成员,但是基因组复杂(相关的灵长类物种:n = 5,基因组大小= 3 Gb),这表明该软件适用于各种实验人群。 Saguaro用C ++编写,在Linux操作系统上运行,可以从http://saguarogw.sourceforge.net/ webcite下载。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号