首页> 外文会议>International Symposium on Knowledge and Systems Sciences(KSS2004); 20041110-12; Ishikawa(JP) >A Hybrid Genetic K-means Algorithm for Clustering High Dimensional Data
【24h】

A Hybrid Genetic K-means Algorithm for Clustering High Dimensional Data

机译:高维数据聚类的混合遗传K-均值算法

获取原文
获取原文并翻译 | 示例

摘要

In this paper we propose a hybrid genetic K-means algorithm combining the K-means algorithm and the genetic algorithm for clustering high dimensional data. The genetic algorithm is an effective global optimization technique that has been shown to be good in finding optimal or near optimal solutions. As the high dimensional data have many local minima, the clustering results crucially depend on the convergence speed of the clustering algorithm for finite performing iterations. The K-means algorithm is a commonly used distance-based clustering algorithm that is good at local search. The hybridization of genetic algorithm and K-means algorithm implying a global stochastic searching algorithm guided by a heuristic local searching algorithm that can improve the searching capacity of the genetic algorithm. The traditional crossover and mutation operators of the genetic algorithm generate invalid offspring during the process of reproduction thus depress the convergence speed of the genetic clustering algorithm. To circumvent this problem, novel crossover and mutation operators are proposed in this paper. We propose a crossover operator based on rearrangement the cluster centers of paired chromosomes according to the minimum distance between cluster centers of paired chromosomes. A mutation operator based on reassignment of the cluster centers within chromosomes is proposed in this paper. Comparative experiments performed on some publicly available data sets demonstrate the effectiveness of the proposed algorithm.
机译:在本文中,我们提出了一种混合遗传K-均值算法,结合了K-means算法和遗传算法对高维数据进行聚类。遗传算法是一种有效的全局优化技术,已被证明可以很好地找到最优解或接近最优解。由于高维数据具有许多局部最小值,因此聚类结果关键取决于聚类算法在有限执行迭代中的收敛速度。 K-means算法是一种常用的基于距离的聚类算法,擅长局部搜索。遗传算法和K-means算法的混合意味着以启发式局部搜索算法为指导的全局随机搜索算法,可以提高遗传算法的搜索能力。遗传算法的传统交叉和变异算子在繁殖过程中产生无效的后代,从而降低了遗传聚类算法的收敛速度。为了解决这个问题,本文提出了新颖的交叉和变异算子。我们根据交叉配对染色体簇中心之间的最小距离,根据配对染色体簇中心的重排提出了一个交叉算子。提出了基于染色体内簇中心重分配的变异算子。在一些公开数据集上进行的比较实验证明了该算法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号