Cluster Stability for Finite Samples

机译：有限样本的簇稳定性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Over the past few years, the notion of stability in data clustering has received growing attention as a cluster validation criterion in a sample-based framework. However, recent work has shown that as the sample size increases, any clustering model will usually become asymptotically stable. This led to the conclusion that stability is lacking as a theoretical and practical tool. The discrepancy between this conclusion and the success of stability in practice has remained an open question, which we attempt to address. Our theoretical approach is that stability, as used by cluster validation algorithms, is similar in certain respects to measures of generalization in a model-selection framework. In such cases, the model chosen governs the convergence rate of generalization bounds. By arguing that these rates are more important than the sample size, we are led to the prediction that stability-based cluster validation algorithms should not degrade with increasing sample size, despite the asymptotic universal stability. This prediction is substantiated by a theoretical analysis as well as some empirical results. We conclude that stability remains a meaningful cluster validation criterion over finite samples.

机译：在过去的几年中，数据聚类中的稳定性概念作为基于样本的框架中的聚类验证标准受到越来越多的关注。但是，最近的工作表明，随着样本数量的增加，任何聚类模型通常都将渐近稳定。由此得出结论，缺乏稳定性作为理论和实践工具。这一结论与实践中取得成功之间的差异仍然是一个悬而未决的问题，我们试图解决这个问题。我们的理论方法是，集群验证算法使用的稳定性在某些方面与模型选择框架中的泛化度量相似。在这种情况下，选择的模型将控制泛化范围的收敛速度。通过争论这些比率比样本量更重要，我们得出了这样的预测：尽管具有渐近的通用稳定性，基于稳定性的聚类验证算法也不会随着样本量的增加而降低。这一预测可以通过理论分析和一些实证结果得到证实。我们得出结论，对于有限样本，稳定性仍然是有意义的聚类验证准则。

著录项

来源
《Annual Conference on Neural Information Processing Systems》|2007年|89-96|共8页
会议地点
作者
Ohad Shamir; Naftali Tishbyt;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词
入库时间 2022-08-26 14:56:36

相似文献

外文文献
中文文献
专利

1. Bayesian Predictive Inference for the Mean and Variance of a Finite Population Proportion: Two Stage Cluster Sampling with Non-Sampled Cluster Sizes Unknown [J] . Michael J. Racz, J. Sedransk Journal of the Indian Society of Agricultural Statistics . 2014,第3期

机译：有限总体比例的均值和方差的贝叶斯预测推断：具有未抽样簇大小的两阶段簇抽样
2. Chul Ahn , Moonseong Heo and Song Zhang . Sample Size Calculations for Clustered and Longitudinal Outcomes in Clinical Research . Boca Raton , CRC Press . Chul Ahn Chul Chul Ahn Ahn , Moonseong Heo Moonseong Moonseong Heo Heo and Song Zhang Song Song Zhang Zhang . Sample Size Calculations for Clustered and Longitudinal Outcomes in Clinical Research Sample Size Calculations for Clustered and Longitudinal Outcomes in Clinical Research . Boca Raton Boca Raton , CRC Press CRC Press . [J] . Michalek Joel Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology . 2018,第1期

机译：Chul Ahn，Moonseong Heo和Song Zhang。临床研究中聚类和纵向结果的样本量计算。 Boca Raton，CRC压力机。 Chul Ahn Chul Chul Ahn Ahn，Moonseong Heo Moonseong Moonseong Heo Heo和宋张松松张张。临床研究中聚类和纵向结果的样本大小计算，临床研究中的聚类和纵向结果中的临床研究样本尺寸计算。 Boca Raton Boca Raton，CRC按CRC压力机。
3. Estimating a Finite Population Mean under Random Non-Response in Two Stage Cluster Sampling with Replacement [J] . Nelson Kiprono Bii, Christopher Ouma Onyango, John Odhiambo Open Journal of Statistics . 2017,第5期

机译：在用替换中的两个阶段集群采样中估算随机非反应下的有限群体平均值
4. Cluster Stability for Finite Samples [C] . Annual Conference on Neural Information Processing Systems . 2007

机译：有限样本的聚类稳定性
5. Stratified Inverse Cluster Sampling with Updating Process for Samples from a Rare Population [D] . Kim, Sewon. 2020

机译：分层逆簇采样，具有稀有群体的样本的更新过程
6. Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap [O] . Hanzhi Zhou, Michael R. Elliott, Trivellore E. Raghunathan -1

机译：使用加权有限总体贝叶斯Bootstrap的两阶段聚类样本中的多重插补
7. Optimal cluster selection probabilities to estimate the finite population distribution function under PPS cluster sampling [O] . Mayor Gallego, José Antonio 2002

机译：PPS聚类抽样下估计种群有限分布函数的最佳聚类选择概率

Cluster Stability for Finite Samples

摘要

著录项

相似文献

相关主题

期刊订阅