首页> 外文期刊>Statistics and computing >Clustering multivariate data using factor analytic Bayesian mixtures with an unknown number of components
【24h】

Clustering multivariate data using factor analytic Bayesian mixtures with an unknown number of components

机译:使用成分数量未知的因子分析贝叶斯混合物对多元数据进行聚类

获取原文
获取原文并翻译 | 示例
           

摘要

Recent work on overfitting Bayesian mixtures of distributions offers a powerful framework for clustering multivariate data using a latent Gaussian model which resembles the factor analysis model. The flexibility provided by overfitting mixture models yields a simple and efficient way in order to estimate the unknown number of clusters and model parameters by Markov chain Monte Carlo sampling. The present study extends this approach by considering a set of eight parameterizations, giving rise to parsimonious representations of the covariance matrix per cluster. A Gibbs sampler combined with a prior parallel tempering scheme is implemented in order to approximately sample from the posterior distribution of the overfitting mixture. The parameterization and number of factors are selected according to the Bayesian information criterion. Identifiability issues related to label switching are dealt by post-processing the simulated output with the Equivalence Classes Representatives algorithm. The contributed method and software are demonstrated and compared to similar models estimated using the expectation-maximization algorithm on simulated and real datasets. The software is available online at .
机译:最近关于拟合贝叶斯分布混合的工作为使用类似于因子分析模型的潜在高斯模型聚类多元数据提供了强大的框架。通过过度拟合混合模型提供的灵活性产生了一种简单有效的方法,以便通过马尔可夫链蒙特卡洛采样来估计未知数目的聚类和模型参数。本研究通过考虑一组八个参数化来扩展此方法,从而产生了每个聚类的协方差矩阵的简约表示。吉布斯采样器结合了先前的平行回火方案,以便从过拟合混合物的后分布中近似采样。根据贝叶斯信息准则选择参数化和因子数量。通过使用等价类代表算法对模拟输出进行后期处理,可以解决与标签切换有关的可识别性问题。演示了所贡献的方法和软件,并将其与在模拟和真实数据集上使用期望最大化算法估算的相似模型进行了比较。该软件可从以下网站在线获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号