【24h】

Mutual Information-Based Clustering: Hard or Soft?

机译:基于互信息的群集:硬还是软?

获取原文

摘要

We investigate mutual information as a cost function for clustering, and show in which cases hard, i.e., deterministic, clusters are optimal. Using convexity properties of mutual information, we show that certain formulations of the information bottleneck problem are solved by hard clusters. Similarly, hard clusters are optimal for the information-theoretic co-clustering problem that deals with simultaneous clustering of two dependent data sets. Hard clusters are not optimal in general for clustering a single dataset based on pairwise (dis-)similarities. We point at interesting and practically relevant special cases of this socalled pairwise clustering problem, for which we can either prove or have evidence that hard clusters are optimal. Our results thus show that one can relax the otherwise combinatorial hard clustering problem to a real-valued optimization problem with the same global optimum.
机译:我们将互信息作为聚类的成本函数进行研究,并显示在哪种情况下硬性(即确定性)聚类是最佳的。利用互信息的凸性,我们证明了信息瓶颈问题的某些公式是由硬聚类解决的。同样,硬聚类对于处理两个相关数据集的同时聚类的信息理论上的聚类问题是最佳的。通常,对于基于成对(非)相似性对单个数据集进行聚类,硬聚类并不是最佳选择。我们指出了这个所谓的成对聚类问题的有趣且与实际相关的特殊情况,对此我们可以证明或有证据表明硬聚类是最优的。因此,我们的结果表明,可以将原本组合困难的聚类问题放宽到具有相同全局最优值的实值优化问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号