首页> 外文会议>International Conference on Machine Learning, Big Data, Cloud and Parallel Computing >Data Summarization Using Clustering and Classification: Spectral Clustering Combined with k-Means Using NFPH
【24h】

Data Summarization Using Clustering and Classification: Spectral Clustering Combined with k-Means Using NFPH

机译:使用聚类和分类的数据摘要:使用NFPH的K-means结合谱聚类

获取原文

摘要

Clustering has been very helpful in knowledge discovery. Data miners are focused in creating quality clusters with reduced time complexity to get the most significant information. This paper aims to analyse existing techniques used in data mining for clustering and find ways to maximize accuracy of clustering. The purpose of our paper is to improve an existing clustering algorithm. This paper will introduce a novel algorithm by combining Spectral clustering with k-means with NFPH. The proposed system replaces the initialization method for cluster centroids in classical k-means algorithms which should solve some of the limitations of the k-means algorithm. We aim to select the most appropriate first centroid rather than selecting randomly. Test data sets from the medical domain which are available for research purposes will be used to train the model and an open source data mining application called WEKA is used for testing. From tests carried out on 10 different UCI data sets using the proposed solution, we found that the clustering error was reduced up to 2 percent while the processing time increased from 4~5 seconds. The increase in processing time is caused by the replacement of the initialization method of k-means. The proposed system reduced the clustering error of the spectral clustering algorithm. This system improved levels of accuracy but the processing time increased to 4 seconds.
机译:聚类对知识发现非常有用。数据矿工专注于创建具有减少时间复杂性的质量集群,以获得最重要的信息。本文旨在分析用于群集数据挖掘的现有技术,并找到最大限度地提高聚类准确性的方法。我们的论文的目的是提高现有聚类算法。本文将通过用NFPH与K型k型谱聚类组合来介绍一种新颖算法。所提出的系统替换了经典k均值算法中的群集质心的初始化方法,该算法应该解决K-means算法的一些限制。我们的目标是选择最合适的第一质心而不是随机选择。从可用于研究目的的医疗领域的测试数据集将用于训练模型,并且使用称为Weka的开源数据挖掘应用程序用于测试。从10个不同的UCI数据集进行的测试使用所提出的解决方案,我们发现聚类误差减少了2%,而处理时间从4〜5秒增加。处理时间的增加是由替换K-means的初始化方法引起的。所提出的系统减少了光谱聚类算法的聚类误差。该系统提高了准确度的水平,但处理时间增加到4秒。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号