Data Summarization Using Clustering and Classification: Spectral Clustering Combined with k-Means Using NFPH

机译：使用聚类和分类的数据摘要：使用NFPH的K-means结合谱聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering has been very helpful in knowledge discovery. Data miners are focused in creating quality clusters with reduced time complexity to get the most significant information. This paper aims to analyse existing techniques used in data mining for clustering and find ways to maximize accuracy of clustering. The purpose of our paper is to improve an existing clustering algorithm. This paper will introduce a novel algorithm by combining Spectral clustering with k-means with NFPH. The proposed system replaces the initialization method for cluster centroids in classical k-means algorithms which should solve some of the limitations of the k-means algorithm. We aim to select the most appropriate first centroid rather than selecting randomly. Test data sets from the medical domain which are available for research purposes will be used to train the model and an open source data mining application called WEKA is used for testing. From tests carried out on 10 different UCI data sets using the proposed solution, we found that the clustering error was reduced up to 2 percent while the processing time increased from 4~5 seconds. The increase in processing time is caused by the replacement of the initialization method of k-means. The proposed system reduced the clustering error of the spectral clustering algorithm. This system improved levels of accuracy but the processing time increased to 4 seconds.

机译：聚类对知识发现非常有用。数据矿工专注于创建具有减少时间复杂性的质量集群，以获得最重要的信息。本文旨在分析用于群集数据挖掘的现有技术，并找到最大限度地提高聚类准确性的方法。我们的论文的目的是提高现有聚类算法。本文将通过用NFPH与K型k型谱聚类组合来介绍一种新颖算法。所提出的系统替换了经典k均值算法中的群集质心的初始化方法，该算法应该解决K-means算法的一些限制。我们的目标是选择最合适的第一质心而不是随机选择。从可用于研究目的的医疗领域的测试数据集将用于训练模型，并且使用称为Weka的开源数据挖掘应用程序用于测试。从10个不同的UCI数据集进行的测试使用所提出的解决方案，我们发现聚类误差减少了2％，而处理时间从4〜5秒增加。处理时间的增加是由替换K-means的初始化方法引起的。所提出的系统减少了光谱聚类算法的聚类误差。该系统提高了准确度的水平，但处理时间增加到4秒。

著录项

来源
《International Conference on Machine Learning, Big Data, Cloud and Parallel Computing》|2019年|587p|共6页
会议地点
作者
Niroj Sapkota; Abeer Alsadoon; P.W.C. Prasad; Amr Elchouemi; Ashutosh Kumar Singh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Clustering algorithms; Data mining; Classification algorithms; Feature extraction; Knowledge discovery; Decision trees; Mathematical model;

机译：聚类算法;数据挖掘;分类算法;特征提取;知识发现;决策树;数学模型;

相似文献

外文文献
中文文献
专利

1. Use of Spectral Clustering Combined with Normalized Cuts (N-Cuts) in an Iterative k-Means Clustering Framework (NKSC) for Superpixel Segmentation with Contour Adherence [J] . Partha Ghosh, Kalyani Mali, Sitansu K. Das Pattern recognition and image analysis: advances in mathematical theory and applications in the USSR . 2018,第3期

机译：使用光谱聚类与标准化切割（N-CUTS）结合使用的迭代K-MEARELING框架（NKSC），用于具有轮廓粘附的超像性分段
2. Clustering and Classification of Cotton Lint Using Principle Component Analysis, Agglomerative Hierarchical Clustering, and K-Means Clustering [J] . Kamalha Edwin, Kiberu Jovan, Nibikora Ildephonse, Journal of natural fibers . 2018,第3a4期

机译：使用主成分分析，聚集层次聚类和K均值聚类对棉绒进行聚类和分类
3. A combination of k-means clustering and entropy filtering for band selection and classification in hyperspectral images [J] . Santos A. C. S., Pedrini H. International journal of remote sensing . 2016,第13a14期

机译：k均值聚类和熵滤波相结合在高光谱图像中进行波段选择和分类
4. Data Summarization Using Clustering and Classification: Spectral Clustering Combined with k-Means Using NFPH [C] . Niroj Sapkota, Abeer Alsadoon, P.W.C. Prasad, . 2019

机译：使用聚类和分类的数据汇总：光谱聚类与使用NFPH的k均值相结合
5. High-Dimensional Data Clustering and Statistical Analysis of Clustering-based Data Summarization Products. [D] . Zhou, Dunke. 2012

机译：高维数据聚类和基于聚类的数据汇总产品的统计分析。
6. Using graph-based consensus clustering for combining K-means clustering of heterogeneous chemical structures [O] . Faisal Saeed, Naomie Salim, Ammar Abdo, 2013

机译：使用基于图的共识聚类结合异构化学结构的K-均值聚类
7. Combining the Self-Organizing Map and K-Means Clustering for On-line Classification of Sensor Data [O] . Kristof Van Laerhoven 2001

机译：结合自组织图和K-均值聚类对传感器数据进行在线分类
8. Semi-Supervised Data Summarization: Using Spectral Libraries to Improve Hyperspectral Clustering [R] . Wagstaff, K. L., Shu, H. P., Mazzoni, D., 2005

机译：半监督数据总结：利用谱库改善高光谱聚类

Data Summarization Using Clustering and Classification: Spectral Clustering Combined with k-Means Using NFPH

摘要

著录项

相似文献

相关主题

期刊订阅