首页> 外文期刊>Molecular biology and evolution >GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss
【24h】

GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss

机译:fa1sax:基因复制,转移和损失下,基于物种树感知最大似然基因系列的工具

获取原文
获取原文并翻译 | 示例
           

摘要

Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges, species-tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on approximations and heuristics that limit the degree of tree space exploration. Here, we present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared with competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson-Foulds distance. On empirical data sets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).
机译:推断为个体同源基因家族的系统发育树是困难的,因为对准通常太短,因此含有不足的信号,而替代模型不可避免地无法捕获进化过程的复杂性。为了克服这些挑战,物种树感知方法还利用推定物种树利用信息。但是,只有很少的方法可用于实现完整的似然框架或考虑水平基因转移。此外,这些方法通常需要昂贵的数据预处理(例如,计算引导树),并依赖于限制树空间探索程度的近似和启发式。在这里,我们呈现Faverax,第一个最大似然种类 - 感知系统感知推理软件。它同时考虑序列级别的替换以及基因级事件,例如依赖于建立的最大似然优化算法的重复,转移和损失。 Generax可以直接从每种基因序列比对和根的,但未确定的物种树延伸为多个基因家族的根系系统发育树。我们表明,与竞争工具相比,在模拟数据中,在相对罗宾逊 - FULDS距离的距离中最接近90%的真实树的树木。在经验数据集上,Generax是在从对齐的序列开始时所有测试方法中最快的,并且它基于我们的模型,IT Infers具有最高似然分数的树木。在512个CPU核心,在8分钟内完成1,099个蓝色细菌家族的树推推和对账。因此,其并行化方案能够实现大规模分析。 Generax在Https://github.com/benoitmorel/generax(上次访问6月17,2020)下获得Gnu GPL。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号