首页> 外文会议>2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming. >Parallel Clustering Based on Partitions of Local Minimal-Spanning-Trees
【24h】

Parallel Clustering Based on Partitions of Local Minimal-Spanning-Trees

机译:基于局部最小生成树划分的并行聚类

获取原文
获取原文并翻译 | 示例

摘要

Many traditional clustering algorithms have the scalability problem while dealing with large data sets. One common strategy to handle the problem is to parallelize the algorithms and execute them along with the input data on high-performance computers. However, many graph-based clustering algorithms are hard to be parallelized since they need to calculate the similarity of all-pairs of all data nodes. In this paper, we propose a new parallel clustering algorithm, called the Para-CPLM (Parallel Clustering based on Partitions of Local Minimal-spanning-trees), which is based on three strategies - graph-based clustering, granular computing, and partition-and-merge. The Para-CPLM partitions the data domain into several regions for parallel execution, and then establishes a local minimal spanning tree in each region. After being established, the Para-CPLM combines those local minimal spanning trees and applies a method, namely the GBC method, to determine the best number of clusters. After the first phase of clustering, it repeatedly finds better pairs (edges) of the inter-clusters to reform the merged tree structure, such that the tree becomes closer to a global minimal spanning tree. Consequently, it uses the GBC method again to find the best number of clusters. From our experimental results, the Para-CPLM achieves significantly shorter execution time and better scalability while compared with the sequential GBC method. In addition, the clustering results are almost identical to those produced by the sequential GBC method.
机译:许多传统的聚类算法在处理大型数据集时都存在可伸缩性问题。解决该问题的一种常见策略是并行化算法,并在高性能计算机上将其与输入数据一起执行。但是,许多基于图的聚类算法很难并行化,因为它们需要计算所有数据节点的所有对的相似度。在本文中,我们提出了一种新的并行聚类算法,称为Para-CPLM(基于局部最小生成树的分区的并行聚类),它基于三种策略-基于图的聚类,粒度计算和分区-并合并。 Para-CPLM将数据域划分为几个区域以供并行执行,然后在每个区域中建立本地最小生成树。建立后,Para-CPLM会结合这些局部最小生成树,并应用一种方法(即GBC方法)来确定最佳群集数。在聚类的第一阶段之后,它会反复查找集群之间的更好对(边缘)以改革合并的树结构,从而使树变得更接近全局最小生成树。因此,它再次使用GBC方法来找到最佳数目的群集。根据我们的实验结果,与顺序GBC方法相比,Para-CPLM的执行时间大大缩短,可伸缩性更好。此外,聚类结果几乎与顺序GBC方法产生的结果相同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号