...
首页> 外文期刊>International Journal of Innovative Computing Information and Control >MODIFIED SINGLE PASS CLUSTERING WITH VARIABLE THRESHOLD APPROACH
【24h】

MODIFIED SINGLE PASS CLUSTERING WITH VARIABLE THRESHOLD APPROACH

机译:具有可变阈值方法的改进的单通行群集

获取原文
获取原文并翻译 | 示例
           

摘要

Data mining is the process of extracting hidden, interesting, non-trivial, potentially useful and previously unknown information from large databases. Clustering is one of the data mining techniques that aims to separate dissimilar objects and group similar objects in the database. There are a number of clustering methods available in literature. In this paper, authors have focused on partitioning based methods. Most popular partitioning based algorithms, k-means and k-medoid, require the number of clusters to be generated as an input parameter. Another partitioning based algorithm, Single Pass Clustering (SPC), requires a threshold similarity value as an input parameter for clustering. In this paper, a modified SPC algorithm is proposed which also uses threshold similarity value but it is not an input parameter, rather, it is a function of data objects to be clustered. To assess performance of proposed approach, several clustering validity measures have been applied on k-means, SPC and the modified SPC algorithms. The stimulated experiments described in this paper confirm good performance of the modified SPC. It is also observed that actual number of clusters is generated when modified SPC is applied on real datasets.
机译:数据挖掘是从大型数据库中提取隐藏的,有趣的,不平凡的,潜在有用的和以前未知的信息的过程。集群是数据挖掘技术之一,旨在分离不同的对象并在数据库中对相似的对象进行分组。文献中有许多聚类方法。在本文中,作者专注于基于分区的方法。最流行的基于分区的算法(k均值和k medoid)要求将生成簇的数量作为输入参数。另一个基于分区的算法,单程聚类(SPC),需要阈值相似度值作为聚类的输入参数。本文提出了一种改进的SPC算法,该算法也使用阈值相似度值,但它不是输入参数,而是要聚类的数据对象的函数。为了评估所提出方法的性能,已对k均值,SPC和改进的SPC算法应用了几种聚类有效性度量。本文所述的刺激实验证实了改性SPC的良好性能。还可以观察到,将修改后的SPC应用于实际数据集时会生成实际的簇数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号