A Genetic Algorithm Approach for Clustering Large Data Sets

机译：大数据集聚类的遗传算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we present a sampling approach to run the k-means algorithm in large data sets. We propose a genetic algorithm to guide sampling based on evaluating the fitness of each individual of the population through the k-means clustering algorithm. Although we want a partition with the lowest SSE, our algorithm tries to find the sample with the highest SSE. After finding a good sample the remaining points of the entire data set are clustered using the nearest centroid and, after that, the SSE of the final solution is calculated. Our proposal is applied on a set of public domain data sets and the results are compared against two other methods: the k-means running in a uniform random sample of the data set, and the k-means in the complete data set. The results showed that our algorithm has a good trade off between quality and computational cost, especially for large data sets and higher number of clusters.

机译：在本文中，我们提出了一种在大数据集中运行k-means算法的采样方法。我们提出了一种遗传算法，通过基于k均值聚类算法评估每个个体的适应性来指导抽样。尽管我们想要具有最低SSE的分区，但是我们的算法会尝试查找具有最高SSE的样本。找到好样本后，使用最接近的质心对整个数据集的其余点进行聚类，然后，计算最终解决方案的SSE。我们的建议应用于一组公共领域数据集，并将结果与其他两种方法进行比较：k均值在数据集的统一随机样本中运行，k均值在完整数据集中。结果表明，我们的算法在质量和计算成本之间取得了很好的折衷，尤其是对于大型数据集和更多的聚类而言。

著录项

来源
《IEEE International Conference on Tools with Artificial Intelligence》|2016年|570-576|共7页
会议地点
作者
Diego Luchi; Alexandre Rodrigues; Flávio Miguel Varejão; Willian Santos;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Clustering algorithms; Sociology; Statistics; Genetic algorithms; Partitioning algorithms; Genetics; Biological cells;

机译：聚类算法;社会学;统计;遗传算法;分区算法;遗传学;生物细胞;

相似文献

外文文献
中文文献
专利

1. A soft computing approach for data mining based query processing using rough sets and genetic algorithms [J] . K.G. Srinivasa, K.R. Venugopal, L.M. Patnaik International Journal of Hybrid Intelligent Systems . 2008,第1期

机译：一种基于粗糙集和遗传算法的基于数据挖掘的查询处理的软计算方法
2. Interblend fusing of genetic algorithm-based attribute selection for clustering heterogeneous data set [J] . Dhayanithi J., Akilandeswari J. Soft computing: A fusion of foundations, methodologies and applications . 2019,第8期

机译：基于遗传算法的互通融合群体群集异构数据集
3. Genetic K-Means Clustering Algorithm for Mixed Numeric and Categorical Data Sets [J] . Dharmendra K Roy, Lokesh K Sharma International Journal of Artificial Intelligence & Applications (IJAIA) . 2010,第2期

机译：混合数值和分类数据集的遗传K均值聚类算法
4. A Genetic Algorithm Approach for Clustering Large Data Sets [C] . Diego Luchi, Alexandre Rodrigues, Flávio Miguel Varej?o, IEEE International Conference on Tools with Artificial Intelligence . 2016

机译：一种聚类大数据集的遗传算法方法
5. Efficient algorithms for clustering and interpolation of large spatial data sets. [D] . Memarsadeghi, Nargess. 2007

机译：大型空间数据集的聚类和插值的高效算法。
6. Partitioning clustering algorithms for protein sequence data sets [O] . Sondes Fayech, Nadia Essoussi, Mohamed Limam 2009

机译：蛋白质序列数据集的分区聚类算法
7. The Research of Imbalanced Data Set of Sample Sampling Method Based on K-Means Cluster and Genetic Algorithm [O] . Yong Yang 2012

机译：基于K-Means聚类和遗传算法的样本采样方法不平衡数据集研究

A Genetic Algorithm Approach for Clustering Large Data Sets

摘要

著录项

相似文献

相关主题

期刊订阅