首页> 外文期刊>Communications in Statistics >Model-based clustering of Gaussian copulas for mixed data
【24h】

Model-based clustering of Gaussian copulas for mixed data

机译:基于模型的Gaussian Copulas用于混合数据的聚类

获取原文
获取原文并翻译 | 示例
           

摘要

Clustering of mixed data is important yet challenging due to a shortage of conventional distributions for such data. In this article, we propose a mixture model of Gaussian copulas for clustering mixed data. Indeed copulas, and Gaussian copulas in particular, are powerful tools for easily modeling the distribution of multivariate variables. This model clusters data sets with continuous, integer, and ordinal variables (all having a cumulative distribution function) by considering the intra-component dependencies in a similar way to the Gaussian mixture. Indeed, each component of the Gaussian copula mixture produces a correlation coefficient for each pair of variables and its univariate margins follow standard distributions (Gaussian, Poisson, and ordered multinomial) depending on the nature of the variable (continuous, integer, or ordinal). As an interesting by-product, this model generalizes many well-known approaches and provides tools for visualization based on its parameters. The Bayesian inference is achieved with a Metropolis-within-Gibbs sampler. The numerical experiments, on simulated and real data, illustrate the benefits of the proposed model: flexible and meaningful parameterization combined with visualization features.
机译:由于这种数据的传统分布不足,混合数据的聚类是重要的,但具有挑战性。在本文中,我们提出了一种用于聚类混合数据的高斯共用的混合模型。实际上,特别是Copulas和Gaussian Copulas是一种强大的工具,可轻松建模多元变量的分布。该模型通过考虑与高斯混合物类似的方式,通过考虑与高斯混合物类似的组件依赖性,具有连续,整数和序列变量(所有具有累积分布函数)的数据集。实际上,高斯Copula混合物的每个组分产生了对每对变量的相关系数,并且其单变量边距遵循标准分布(高斯,泊松和有序多项式),这取决于变量的性质(连续,整数或序数)。作为一个有趣的副产品,该模型概括了许多着名的方法,并根据其参数提供可视化的工具。贝叶斯推断是通过吉布斯in-gibbs采样器内的实现。模拟和实际数据的数值实验说明了所提出的模型的好处:灵活且有意义的参数化与可视化功能相结合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号