首页> 美国卫生研究院文献>other >A Bayesian Alternative to Mutual Information for the Hierarchical Clustering of Dependent Random Variables
【2h】

A Bayesian Alternative to Mutual Information for the Hierarchical Clustering of Dependent Random Variables

机译:相依随机变量分层聚类的互信息贝叶斯选择

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC) raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity), provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms) to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI) datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering.
机译:在集聚层次聚类(AHC)中使用互信息作为相似性度量提出了一个重要问题:需要对变量的维数进行一些校正。在这项工作中,我们将AHC过程中合并相关多元正态变量的决策公式化为贝叶斯模型比较。我们发现,贝叶斯公式自然地将经验协方差矩阵缩小为先验矩阵集(例如,同一性),提供了自动停止规则,并使用根据度量的维数放大度量的术语来校正维数。变量。同样,所得对数贝叶斯因数与互信息的插入估计渐近成比例,并且对维数进行了加性校正,并符合贝叶斯信息准则。我们研究了这些贝叶斯替代方法(精确和渐近形式)对模拟和真实数据的共同信息的行为。首先在模拟中得出了令人鼓舞的结果:基于对数贝叶斯因子的分层聚类在分类精度方面优于现成的聚类技术以及原始和归一化的互信息。在一个玩具示例中,我们发现贝叶斯方法产生的结果与互信息聚类技术的结果相似,并且具有自动阈值的优势。在测量大脑活动的真实功能磁共振成像(fMRI)数据集上,它识别出与标准程序的既定结果一致的簇。在此应用程序上,规范化的互信息在系统地偏爱非常大的群集方面具有高度的非典型行为。这些初始实验表明,提出的互信息的贝叶斯替代方案是用于层次聚类的有用的新工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号