Research on K-medoids clustering algorithm based on data density and its parallel processing based on MapReduce

Aiguo Liu; Shuli Zou; Taorong Qiu; Xiaoming Bai

首页> 外文期刊>Journal of Residuals Science & Technology >Research on K-medoids clustering algorithm based on data density and its parallel processing based on MapReduce

【24h】

Research on K-medoids clustering algorithm based on data density and its parallel processing based on MapReduce

机译：基于数据密度的K-medoids聚类算法及其基于MapReduce的并行处理研究

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

First of all, in order to solve the problem with varying clustering results from selecting randomly the initial k clustering centers in the k-medoids algorithm, we propose combining the k-medoids algorithm and the density-based clustering algorithm. The improved k-medoids algorithm uses the density-based clustering algorithm to generate automatically the best appropriate k-clustering centers that are used as the initial representation seeds in the k-medoids algorithm. Secondly, considering the k-medoids algorithm does not scale well for large data sets, a parallel processing procedure of the improved k-medoids algorithm based on MapReduce computing model is designed and implemented on Hadoop platform. The parallel processing of the improved k-medoids algorithm is tested on some data sets. And experimental results show that the clustering effectiveness of the improved k-medoids algorithm becomes better and the designed parallel processing can do scale well for large data sets.

机译：首先，为了解决通过在k-medoids算法中随机选择初始k个聚类中心来解决聚类结果变化的问题，我们建议将k-medoids算法与基于密度的聚类算法相结合。改进的k-medoids算法使用基于密度的聚类算法自动生成最佳的k-聚类中心，这些中心将用作k-medoids算法中的初始表示种子。其次，考虑到k-medoids算法不能很好地适应大数据集，在Hadoop平台上设计并实现了基于MapReduce计算模型的改进k-medoids算法的并行处理过程。在某些数据集上测试了改进的k-medoids算法的并行处理。实验结果表明，改进的k-medoids算法的聚类效果更好，所设计的并行处理可以很好地扩展大数据集。

著录项

来源
《Journal of Residuals Science & Technology》 |2016年第7期|共页
作者
Aiguo Liu; Shuli Zou; Taorong Qiu; Xiaoming Bai;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类废物处理与综合利用;
关键词

相似文献

外文文献
中文文献
专利

1. Density-based Algorithms for Big Data Clustering Using MapReduce Framework: A Comprehensive Study [J] . Khader Mariam, Al-Naymat Ghazi ACM Computing Surveys . 2021,第5期

机译：基于密度的大数据聚类算法使用MapReduce框架：综合研究
2. DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce [J] . Younghoon Kim, Kyuseok Shim, Min-Soeng Kim, Information Systems . 2014,第juna期

机译：DBCURE-MR：使用MapReduce的高效的基于密度的大数据聚类算法
3. A Performance Comparison of Big Data Processing Platform Based on Parallel Clustering Algorithms [J] . Mo Hai, Yuejing Zhang, Haifeng Li Procedia Computer Science . 2018,第1期

机译：基于并行聚类算法的大数据处理平台性能比较
4. A Parallel K-Medoids Algorithm for Clustering based on MapReduce [C] . M. Omair Shafiq, Eric Torunski IEEE International Conference on Machine Learning and Applications . 2016

机译：基于MapReduce的并行K-Medoids聚类算法
5. The K-MM clustering algorithm based on K-means and K-medoids in data mining. [D] . Li, Yihao. 2011

机译：数据挖掘中基于K-means和K-medoids的K-MM聚类算法。
6. Big Data: A Parallel Particle Swarm Optimization-Back-Propagation Neural Network Algorithm Based on MapReduce [O] . Jianfang Cao, Hongyan Cui, Hao Shi, -1

机译：大数据：基于MapReduce的并行粒子群优化-反向传播神经网络算法
7. Grid-Based Parallel Algorithms of Join Queries for Analyzing Multi-Dimensional Data on MapReduce [O] . Miyoung JANG, Jae-Woo CHANG 2018

机译：基于网格的连接查询算法，用于分析MapReduce的多维数据

Research on K-medoids clustering algorithm based on data density and its parallel processing based on MapReduce

摘要

著录项

相似文献

相关主题

期刊订阅