首页> 外国专利> MAPREDUCE-BASED DISTRIBUTED CLUSTER PROCESSING METHOD FOR LARGE-SCALE DATA

MAPREDUCE-BASED DISTRIBUTED CLUSTER PROCESSING METHOD FOR LARGE-SCALE DATA

机译：基于MAPREDUCE的大规模数据分布式集群处理方法

页面导航

摘要
著录项
相似文献

摘要

Provided by the present invention is a MapReduce-based distributed cluster processing method for large-scale data, which comprises: sampling large-scale data according to an equal-scale non-repetition principle; inputting the sampled data into a MapReduce distributed parallel framework, and calculating the local density and average density of the sampled data; finding all sampled data having a local density greater than the average density to serve as a candidate point set of initial cluster center points for each cluster, and feeding the candidate point set back to a master node, wherein every two adjacent candidate points at a distance from each other which is greater than twice that of a set range are selected to serve as the initial cluster center points; using the MapReduce distributed parallel framework to perform a parallel clustering task, wherein an average value of the distance between the data is calculated for each cluster in order to update the cluster center points; child nodes applying an error sum of squares criterion function so as to determine whether to continue iteration; the child nodes performing clustering on the large-scale data according to the cluster center points. By means of the present invention, parallel clustering is implemented, thereby reducing the number of clustering iterations, while increasing clustering accuracy and the efficiency of parallel clustering.

机译：本发明提供了一种基于MapReduce的大规模数据分布式簇处理方法，包括：根据等规模非重复原理对大规模数据进行采样;将采样数据输入到MapReduce分布式并行框架中，计算采样数据的局部密度和平均密度;查找所有局部密度大于平均密度的采样数据，以用作每个聚类的初始聚类中心点的候选点集，并将该候选点集反馈回主节点，其中每两个相邻的候选点相距一定距离彼此之间的距离大于设定范围的两倍被选作初始聚类中心点;使用MapReduce分布式并行框架执行并行聚类任务，其中为每个聚类计算数据之间的距离的平均值，以更新聚类中心点;子节点应用误差平方和标准函数，以确定是否继续迭代;子节点根据聚类中心点对大规模数据进行聚类。通过本发明，实现了并行聚类，从而减少了聚类迭代的次数，同时提高了聚类精度和并行聚类的效率。

著录项

公开/公告号WO2018219163A1

专利类型
公开/公告日2018-12-06

原文格式PDF
申请/专利权人 NORTHEASTERN UNIVERSITY;
展开▼

申请/专利号WO2018CN87567
发明设计人 GAO TIANHAN;KONG XUE;
展开▼

申请日2018-05-18
分类号G06F17/30;
国家 WO
入库时间 2022-08-21 11:57:53

相似文献

专利
外文文献
中文文献