...
首页> 外文期刊>BMC Genomics >Robustness of birth-death and gain models for inferring evolutionary events
【24h】

Robustness of birth-death and gain models for inferring evolutionary events

机译:推论进化事件的生死和收获模型的鲁棒性

获取原文
           

摘要

Background Phylogenetic birth-death models are opening a new window on the processes of genome evolution in studies of the evolution of gene and protein families, protein-protein interaction networks, microRNAs, and copy number variation. Given a species tree and a set of genomic characters in present-day species, the birth-death approach estimates the most likely rates required to explain the observed data and returns the expected ancestral character states and the history of character state changes. Achieving a balance between model complexity and generalizability is a fundamental challenge in the application of birth-death models. While more parameters promise greater accuracy and more biologically realistic models, increasing model complexity can lead to overfitting and a heavy computational cost. Results Here we present a systematic, empirical investigation of these tradeoffs, using protein domain families in six metazoan genomes as a case study. We compared models of increasing complexity, implemented in the Count program, with respect to model fit, robustness, and stability. In addition, we used a bootstrapping procedure to assess estimator variability. The results show that the most complex model, which allows for both branch-specific and family-specific rate variation, achieves the best fit, without overfitting. Variance remains low with increasing complexity, except for family-specific loss rates. This variance is reduced when the number of discrete rate categories is increased. Model choice is of greatest concern when different models lead to fundamentally different outcomes. To investigate the extent to which model choice influences biological interpretation, ancestral states and expected events were inferred under each model. Disturbingly, the different models not only resulted in quantitatively different histories, but predicted qualitatively different patterns of domain family turnover and genome expansion and reduction. Conclusions The work presented here evaluates model choice for genomic birth-death models in a systematic way and presents the first use of bootstrapping to assess estimator variance in birth-death models. We find that a model incorporating both lineage and family rate variation yields more accurate estimators without sacrificing generality. Our results indicate that model choice can lead to fundamentally different evolutionary conclusions, emphasizing the importance of more biologically realistic and complex models.
机译:背景系统发生学的生死模型为基因和蛋白质家族,蛋白质-蛋白质相互作用网络,microRNA和拷贝数变异的进化研究中的基因组进化过程打开了一个新窗口。给定一个物种树和当今物种中的一组基因组特征,出生-死亡方法会估计解释观测数据所需的最可能发生率,并返回预期的祖先特征状态和特征状态变化的历史。在生死模型的应用中,实现模型复杂性和可概括性之间的平衡是一项基本挑战。尽管更多的参数可以保证更高的准确性和更生物学的现实模型,但是增加模型的复杂性可能会导致过度拟合和沉重的计算成本。结果在此,我们以六个后生动物基因组中的蛋白质结构域家族为例,对这些折衷方案进行了系统的,经验性的研究。我们比较了在Count程序中实现的,模型复杂性,鲁棒性和稳定性方面日益复杂的模型。此外,我们使用了引导程序来评估估计量的可变性。结果表明,最复杂的模型(允许分支特定速率和家族特定速率变化)实现了最佳拟合,而没有过度拟合。除了特定于家庭的损失率外,差异随着复杂性的增加而保持较低。当离散速率类别的数量增加时,此方差减小。当不同的模型导致根本不同的结果时,模型选择是最令人关注的问题。为了研究模型选择对生物学解释的影响程度,在每种模型下推断祖先状态和预期事件。令人不安的是,不同的模型不仅导致数量上不同的历史记录,而且还预测了域家族更新,基因组扩展和减少的定性不同模式。结论本文提出的工作以系统的方式评估了基因组出生死亡模型的模型选择,并提出了首次使用自举法评估出生死亡模型中估计量方差的方法。我们发现结合血统和家庭比率变化的模型可以在不牺牲一般性的情况下得出更准确的估计量。我们的结果表明,模型的选择可能导致根本不同的进化结论,从而强调了更具生物学现实性和复杂性的模型的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号