...
首页> 外文期刊>Proceedings of the National Academy of Sciences of the United States of America >Incorporating gene-specific variation when inferring and evaluating optimal evolutionary tree topologies from multilocus sequence data
【24h】

Incorporating gene-specific variation when inferring and evaluating optimal evolutionary tree topologies from multilocus sequence data

机译:从多基因座序列数据推论和评估最佳进化树拓扑时,要纳入基因特异性变异

获取原文
获取原文并翻译 | 示例
           

摘要

Because of the increase of genomic data, multiple genes are often available for the inference of phylogenetic relationships. The simple approach for combining multiple genes from the same taxon is to concatenate the sequences and then ignore the fact that different positions in the concatenated sequence came from different genes. Here, we discuss two criteria for inferring the optimal tree topology from data sets with multiple genes. These criteria are designed for multigene data sets where gene-specific evolutionary features are too important to ignore. One criterion is conventional and is obtained by taking the sum of log-likelihoods over all genes. The other criterion is obtained by dividing the log-likelihood for a gene by its sequence length and then taking the arithmetic mean over genes of these ratios. A similar strategy could be adopted with parsimony scores. The optimal tree is then declared to be the one for which the sum or the arithmetic mean is maximized. These criteria are justified within a two-stage hierarchical framework. The first level of the hierarchy represents gene-specific evolutionary features, and the second represents site-specific features for given genes. For testing significance of the optimal topology, we suggest a two-stage bootstrap procedure that involves resampling genes and then resampling alignment columns within resampled genes. An advantage of this procedure over concatenation is that it can effectively account for gene-specific evolutionary features. We discuss the applicability of the two-stage bootstrap idea to the Kishino-Hasegawa test and the Shimodaira-Hasegawa test.
机译:由于基因组数据的增加,经常可以使用多种基因来推断系统发育关系。组合来自同一分类单元的多个基因的简单方法是连接序列,然后忽略连接序列中不同位置来自不同基因的事实。在这里,我们讨论了从具有多个基因的数据集推断最佳树形拓扑的两个标准。这些标准是针对多基因数据集设计的,这些基因集的基因特异性进化特征非常重要,不容忽视。一种标准是常规的,并且是通过取所有基因的对数似然之和而获得的。通过将一个基因的对数似然除以其序列长度,然后对这些比率的基因取算术平均值,可以得出另一个标准。简约评分可以采用类似的策略。然后,将最佳树声明为总和或算术平均值最大的树。这些标准在两阶段的层次结构框架内是合理的。层次结构的第一层代表特定基因的进化特征,第二层代表给定基因的位点特异性特征。为了测试最佳拓扑的重要性,我们建议采用两阶段引导程序,该过程包括重采样基因,然后重采样重采样基因中的比对列。该方法优于串联的优势在于,它可以有效地解释基因特异性进化特征。我们讨论了两阶段引导程序思想对Kishino-Hasegawa测试和Shimodaira-Hasegawa测试的适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号