首页> 外文期刊>BMC Bioinformatics >Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods
【24h】

Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods

机译:基于Dirichlet过程先验的贝叶斯遗传聚类算法的特征及贝叶斯聚类方法的比较。

获取原文
           

摘要

Background A Bayesian approach based on a Dirichlet process (DP) prior is useful for inferring genetic population structures because it can infer the number of populations and the assignment of individuals simultaneously. However, the properties of the DP prior method are not well understood, and therefore, the use of this method is relatively uncommon. We characterized the DP prior method to increase its practical use. Results First, we evaluated the usefulness of the sequentially-allocated merge-split (SAMS) sampler, which is a technique for improving the mixing of Markov chain Monte Carlo algorithms. Although this sampler has been implemented in a preceding program, HWLER, its effectiveness has not been investigated. We showed that this sampler was effective for population structure analysis. Implementation of this sampler was useful with regard to the accuracy of inference and computational time. Second, we examined the effect of a hyperparameter for the prior distribution of allele frequencies and showed that the specification of this parameter was important and could be resolved by considering the parameter as a variable. Third, we compared the DP prior method with other Bayesian clustering methods and showed that the DP prior method was suitable for data sets with unbalanced sample sizes among populations. In contrast, although current popular algorithms for population structure analysis, such as those implemented in STRUCTURE, were suitable for data sets with uniform sample sizes, inferences with these algorithms for unbalanced sample sizes tended to be less accurate than those with the DP prior method. Conclusions The clustering method based on the DP prior was found to be useful because it can infer the number of populations and simultaneously assign individuals into populations, and it is suitable for data sets with unbalanced sample sizes among populations. Here we presented a novel program, DPART, that implements the SAMS sampler and can consider the hyperparameter for the prior distribution of allele frequencies to be a variable.
机译:背景技术基于Dirichlet过程(DP)先验的贝叶斯方法可用于推断遗传种群结构,因为它可以同时推断种群数量和个体分配。然而,DP先有方法的性质尚未被很好地理解,因此,这种方法的使用相对不常见。我们对DP先验方法进行了表征,以增加其实际应用。结果首先,我们评估了顺序分配的合并拆分(SAMS)采样器的实用性,该采样器是一种改进Markov链蒙特卡洛算法混合的技术。尽管此采样器已在以前的程序HWLER中实现,但尚未研究其有效性。我们证明了该采样器对于人口结构分析是有效的。就推断的准确性和计算时间而言,此采样器的实现很有用。其次,我们检查了超参数对等位基因频率先前分布的影响,并表明此参数的规范很重要,可以通过将参数视为变量来解决。第三,我们将DP先验方法与其他贝叶斯聚类方法进行了比较,结果表明DP先验方法适用于总体样本量不平衡的数据集。相反,尽管当前流行的用于人口结构分析的算法(例如在STRUCTURE中实现的算法)适用于具有统一样本量的数据集,但是使用这些算法对不平衡样本量的推论往往不如采用DP先验方法的推论准确。结论发现基于DP先验的聚类方法是有用的,因为它可以推断人口数量并同时将个体分配到人口中,并且适用于人口中样本量不平衡的数据集。在这里,我们介绍了一个新颖的程序DPART,该程序实现了SAMS采样器,并且可以将等位基因频率先前分布的超参数视为变量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号