首页> 外文会议>International Conference on Pattern Recognition Applications and Methods >Criteria for Mixture-Model Clustering with Side-Information
【24h】

Criteria for Mixture-Model Clustering with Side-Information

机译:混合模型聚类与侧面信息的标准

获取原文

摘要

The estimation of mixture models is a well-known approach for cluster analysis and several criteria have been proposed to select the number of clusters. In this paper, we consider mixture models using side-information, which gives the constraint that some data in a group originate from the same source. Then the usual criteria are not suitable. An EM (Expectation-Maximization) algorithm has been previously developed to jointly allow the determination of the model parameters and the data labelling, for a given number of clusters. In this work we adapt three usual criteria, which are the bayesian information criterion (BIC), the Akaike information criterion (AIC), and the entropy criterion (NEC), so that they can take into consideration the side-information. One simulated problem and two real data sets have been used to show the relevance of the modified criterion versions and compare the criteria. The efficiency of both the EM algorithm and the criteria, for selecting the right number of clusters while getting a good clustering, is in relation with the amount of side-information. Side-information being mainly useful when the clusters overlap, the best criterion is the modified BIC.
机译:混合模型的估计是众所周知的聚类分析方法,并提出了几个标准来选择簇的数量。在本文中,我们考虑使用侧面信息的混合模型,这给出了组中的一些数据来自相同来源的约束。然后通常的标准不合适。先前已经开发了EM(期望最大化)算法以共同允许确定给定数量的集群的模型参数和数据标签。在这项工作中,我们适应三种常用的标准,这是贝叶斯信息准则(BIC),赤池信息准则(AIC),和熵标准(NEC),使他们能够考虑到边信息。已经使用了一个模拟问题和两个实际数据集来显示修改的标准版本的相关性并比较标准。 EM算法和标准的效率,用于在获得良好的聚类时选择正确的集群数量,与侧信息的数量相关。侧面信息主要有用,当群集重叠时,最佳标准是修改的BIC。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号