首页> 外文期刊>Journal of supercomputing >AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities
【24h】

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

机译:AA-DBSCAN:一种近似的自适应DBSCAN,用于查找具有不同密度的聚类

获取原文
获取原文并翻译 | 示例
           

摘要

Clustering is a typical data mining technique that partitions a dataset into multiple subsets of similar objects according to similarity metrics. In particular, density-based algorithms can find clusters of different shapes and sizes while remaining robust to noise objects. DBSCAN, a representative density-based algorithm, finds clusters by defining the density criterion with global parameters, epsilon-distance and MinPts. However, most density-based algorithms, including DBSCAN, find clusters incorrectly because the density criterion is fixed to the global parameters and misapplied to clusters of varying densities. Although studies have been conducted to determine optimal parameters or to improve clustering performance using additional parameters and computations, running time for clustering has been significantly increased, particularly when the dataset is large. In this study, we focus on minimizing the additional computation required to determine the parameters by using the approximate adaptive epsilon-distance for each density while finding the clusters with varying densities that DBSCAN cannot find. Specifically, we propose a new tree structure based on a quadtree to define a dataset density layer. In addition, we propose approximate adaptive DBSCAN (AA-DBSCAN) and kAA-DBSCAN that have clustering performance similar to those of existing algorithms for finding clusters with varying densities while significantly reducing the running time required to perform clustering. We evaluate the proposed algorithms, AA-DBSCAN and kAA-DBSCAN, via extensive experiments using the state-of-the-art algorithms. Experimental results demonstrate an improvement in clustering performance and reduction in running time of the proposed algorithms.
机译:聚类是一种典型的数据挖掘技术,可根据相似性指标将数据集分为相似对象的多个子集。特别地,基于密度的算法可以找到不同形状和大小的簇,同时对噪声对象保持鲁棒性。 DBSCAN是一种基于密度的代表性算法,它通过使用全局参数,ε距离和MinPts定义密度标准来查找聚类。但是,大多数基于密度的算法(包括DBSCAN)都无法正确找到簇,因为密度标准固定于全局参数,并错误地应用于密度不同的簇。尽管已经进行了研究以确定最佳参数或使用其他参数和计算来改善聚类性能,但是聚类的运行时间已显着增加,尤其是在数据集很大时。在这项研究中,我们专注于通过使用每个密度的近似自适应epsilon距离来最小化确定参数所需的额外计算,同时找到具有DBSCAN找不到的不同密度的簇。具体来说,我们提出了一种基于四叉树的新树结构,以定义数据集密度层。此外,我们提出了近似的自适应DBSCAN(AA-DBSCAN)和kAA-DBSCAN,它们的聚类性能类似于现有算法的聚类性能,可找到密度不同的聚类,同时大大减少了执行聚类所需的运行时间。通过使用最新算法的大量实验,我们对提出的算法AA-DBSCAN和kAA-DBSCAN进行了评估。实验结果证明了该算法在聚类性能上的改进和运行时间的减少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号