首页> 外文会议>International conference on computer and network technology >An Improved Clustering Algorithm Based On K-Means Algorithm
【24h】

An Improved Clustering Algorithm Based On K-Means Algorithm

机译:一种基于K-Means算法的改进聚类算法

获取原文

摘要

A new document clustering algorithm is put forward in this paper by improving the existing K-means and Neural Gas algorithm.The difference between our new algorithm and K-means algorithm is that in our algorithm each point is not only attributable to one cluster,only affects the value of one cluster centroid,but like the Neural Gas algorithm,each point affects the value of multiple cluster centroids.The difference between our new algorithm and Neural Gas algorithm is that in our algorithm the degree of effect of any point on a cluster centroid depends on the distance values between this point and the other more recent cluster centroids.Experiments show that in terms of five metrics such as entropy,purity,F1 values,Rand Index and normalized mutual information,our new algorithm has better clustering results than other clustering algorithms when clustering on a number of different text data sets;when clustering on one text data set WAP under many different initial conditions,our clustering algorithm is more stable and better than other algorithms;when clustering on different size data sets,our algorithm is faster than other algorithms,with linear scalability.
机译:一个新的文档聚类算法通过改进我们的新算法和K-means算法之间现有的K-手段和神经燃气algorithm.The差异提出本文的是,在我们的算法每个点不仅归因于一个群集,只影响一个簇质心的值,但像神经气体算法一样,每个点影响多个簇质心的值。我们的新算法与神经气体气体算法之间的差异是在我们的算法中,群集群集的任何点的效果程度质心取决于这一点与其他更新的群集质心之间的距离值。实验表明,在五个指标方面,诸如熵,纯度,F1值,Rand指数和标准化的互信息,我们的新算法具有比其他更好的聚类结果在多个不同的文本数据集上群集群集算法;当在许多不同的初始条件下群集一个文本数据集WAP时,我们的群集G算法比其他算法更稳定,更好;当在不同大小的数据集中聚类时,我们的算法比其他算法快,具有线性可伸缩性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号