【24h】

Cluster Stability for Finite Samples

机译:有限样本的簇稳定性

获取原文

摘要

Over the past few years, the notion of stability in data clustering has received growing attention as a cluster validation criterion in a sample-based framework. However, recent work has shown that as the sample size increases, any clustering model will usually become asymptotically stable. This led to the conclusion that stability is lacking as a theoretical and practical tool. The discrepancy between this conclusion and the success of stability in practice has remained an open question, which we attempt to address. Our theoretical approach is that stability, as used by cluster validation algorithms, is similar in certain respects to measures of generalization in a model-selection framework. In such cases, the model chosen governs the convergence rate of generalization bounds. By arguing that these rates are more important than the sample size, we are led to the prediction that stability-based cluster validation algorithms should not degrade with increasing sample size, despite the asymptotic universal stability. This prediction is substantiated by a theoretical analysis as well as some empirical results. We conclude that stability remains a meaningful cluster validation criterion over finite samples.
机译:在过去的几年中,数据聚类中的稳定性概念作为基于样本的框架中的聚类验证标准受到越来越多的关注。但是,最近的工作表明,随着样本数量的增加,任何聚类模型通常都将渐近稳定。由此得出结论,缺乏稳定性作为理论和实践工具。这一结论与实践中取得成功之间的差异仍然是一个悬而未决的问题,我们试图解决这个问题。我们的理论方法是,集群验证算法使用的稳定性在某些方面与模型选择框架中的泛化度量相似。在这种情况下,选择的模型将控制泛化范围的收敛速度。通过争论这些比率比样本量更重要,我们得出了这样的预测:尽管具有渐近的通用稳定性,基于稳定性的聚类验证算法也不会随着样本量的增加而降低。这一预测可以通过理论分析和一些实证结果得到证实。我们得出结论,对于有限样本,稳定性仍然是有意义的聚类验证准则。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号