首页> 外文期刊>The Kasetsart Journal >Comparison of Clustering Techniques for Cluster Analysis
【24h】

Comparison of Clustering Techniques for Cluster Analysis

机译:聚类分析的聚类技术比较

获取原文
获取原文并翻译 | 示例
       

摘要

Cluster analysis is important for analyzing the number of clusters of natural data in several domains. Various clustering methods have been proposed. However, it is very difficult to choose the method best suited to the type of data. Therefore, the objective of this research was to compare the effectiveness of five clustering techniques with multivariate data. The techniques were: hierarchical clustering method; K-means clustering algorithm; Kohonen's Self-Organizing Maps method (SOM); K-medoids method; and K-medoids method integrated with Dynamic Time Warping distance measure (DTW). To evaluate these five techniques, the root mean square standard deviation (RMSSTD) and r~2 (RS) were used. For RMSSTD, a lower value indicates a better technique and for RS, a higher value indicates a better technique. These approaches were evaluated using both real and simulated data which were multivariate normally distributed. Each dataset was generated by a Monte Carlo technique with 100 sample sizes and repeated 1,000 times for 3,5 and 7 variables. In this research, 2,3,4,5,6,7 and 8 clusters were studied. Both real and simulated datasets provided the same result, with the K-means clustering method having the closest RMSSTD and RS results to the SOM method. These two methods yielded the lowest RMSSTD and highest RS in all simulations. Hence, both K-means and SOM were considered to be the most suitable techniques for cluster analysis.
机译:聚类分析对于分析多个域中自然数据的聚类数量非常重要。已经提出了各种聚类方法。但是,很难选择最适合数据类型的方法。因此,本研究的目的是比较五种聚类技术与多变量数据的有效性。这些技术是:层次聚类方法; K-均值聚类算法; Kohonen的自组织映射方法(SOM); K型方法和K-medoids方法与动态时间规整距离测量(DTW)集成在一起。为了评估这五种技术,使用了均方根标准差(RMSSTD)和r〜2(RS)。对于RMSSTD,较低的值表示更好的技术,对于RS,较高的值表示更好的技术。使用多变量正态分布的实际和模拟数据对这些方法进行了评估。每个数据集都是通过Monte Carlo技术生成的,具有100个样本大小,并针对3,5和7个变量重复了1,000次。在这项研究中,研究了2,3,4,5,6,7和8个群集。真实数据集和模拟数据集都提供了相同的结果,K均值聚类方法的RMSSTD和RS结果与SOM方法最接近。在所有模拟中,这两种方法产生最低的RMSSTD和最高的RS。因此,K均值和SOM被认为是最适合聚类分析的技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号