首页> 外文学位 >Subspace clustering based on fuzzy models and mean shifts.
【24h】

Subspace clustering based on fuzzy models and mean shifts.

机译:基于模糊模型和均值漂移的子空间聚类。

获取原文
获取原文并翻译 | 示例

摘要

Cluster analysis is a way to create groups of objects, or clusters, in such a way that objects in one cluster are very similar and objects in different clusters are quite distinct. Cluster analysis has found applications in many areas such as text mining, pattern recognition, gene expressions, customer segmentations, image processing, etc. However, cluster analysis is a very complex task and faces many challenges, such as the curse of dimensionality and the unknown number of clusters.;The performance of the algorithms is demonstrated through extensive experimental evaluations, using a variety of synthetic data sets.;This dissertation introduces a few novel approaches to overcome some limitations of existing clustering algorithms in clustering high dimensional data sets. It makes four specific contributions: (a) The FSC Algorithm. The fuzzy subspace clustering (FSC) algorithm is a novel method to clustering high dimensional data sets. In this algorithm, we fuzzify dimension rather than class membership; (b) Convergence of the FSC Algorithm. The convergence of the FSC algorithm is established via Zangwill's convergence theorem. It is shown that the iteration sequence produced by the FSC algorithm terminates at a point in the solution set S or there is a subsequence converging to a point in S; (c) The MSSC Algorithm. While the FSC algorithm is developed primarily to deal with the curse of dimensionality, the MSSC (Mean Shift for Subspace Clustering) algorithm is developed to address the issue of determining the number of clusters. The MSSC algorithm uses the idea behind the FSC algorithm to recover subspace clusters and, at the same time, try to find the correct number of subspace clusters; (d) Bifurcations of the MSSC Algorithm. The MSSC algorithm involves a parameter beta. At beta → 0 the MSSC algorithm produces a single cluster containing all the data points, while at beta → infinity the MSSC algorithm produces k distinct clusters, where k is the number of initial centers. In other words, the single cluster will split into small clusters at higher beta. The critical value for beta when the first phase transition occurs is approximated.
机译:聚类分析是一种创建对象组或聚类的方法,以使一个聚类中的对象非常相似,而不同聚类中的对象却完全不同。聚类分析已在许多领域找到了应用,例如文本挖掘,模式识别,基因表达,客户细分,图像处理等。但是,聚类分析是一项非常复杂的任务,并且面临许多挑战,例如维度的诅咒和未知大量的聚类;通过广泛的实验评估,使用各种合成数据集,证明了算法的性能。本文为克服现有聚类算法在聚类高维数据集方面的局限性,介绍了一些新颖的方法。它做出了四个具体贡献:(a)FSC算法。模糊子空间聚类(FSC)算法是一种对高维数据集进行聚类的新方法。在这种算法中,我们对维数进行模糊处理,而不是对类成员资格进行模糊处理。 (b)FSC算法的收敛性。 FSC算法的收敛是通过Zangwill的收敛定理建立的。结果表明,FSC算法产生的迭代序列终止于解集S中的某个点,或者有一个子序列收敛到S中的一个点; (c)MSSC算法。虽然FSC算法主要是为了处理维数的诅咒而开发的,但MSSC(子空间聚类的均值漂移)算法却是为解决确定聚类数量的问题而开发的。 MSSC算法使用FSC算法背后的思想来恢复子空间簇,同时尝试找到正确数量的子空间簇。 (d)MSSC算法的分歧。 MSSC算法涉及参数beta。在beta→0时,MSSC算法产生包含所有数据点的单个簇,而在beta→无穷大时,MSSC算法产生k个不同的簇,其中k是初始中心的数量。换句话说,单个集群将以更高的beta分成小集群。当发生第一个相变时,β的临界值是近似值。

著录项

  • 作者

    Gan, Guojun.;

  • 作者单位

    York University (Canada).;

  • 授予单位 York University (Canada).;
  • 学科 Mathematics.
  • 学位 Ph.D.
  • 年度 2007
  • 页码 215 p.
  • 总页数 215
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号