...
首页> 外文期刊>Fuzzy sets and systems >Speeding up the large-scale consensus fuzzy clustering for handling Big Data
【24h】

Speeding up the large-scale consensus fuzzy clustering for handling Big Data

机译:加快大规模共识模糊聚类处理大数据的速度

获取原文
获取原文并翻译 | 示例
           

摘要

Massive data can create a real competitive advantage for the companies; it is used to better respond to customers, to follow the behavior of consumers, to anticipate the evolutions, etc. However, it has its own deficiencies. This data volume not only requires big storage spaces but also makes analysis, processing and retrieval operations very difficult and hugely time-consuming. One way to overcome these problems is to cluster this data into a compact format that is still an informative version of the entire data. A lot of clustering algorithms have been proposed. However, their scaling is poor in terms of computation time whenever the size of the data gets larger. In this paper, we make full use of consensus clustering to handle Big Data clustering. We use sampling combined with a split-and-merge strategy to fragment data into small subsets, then basic partitions are locally generated from them using RHadoop's parallel processing MapReduce model and later a consensus tendency is followed to obtain the final result. A scalability analysis is conducted to demonstrate the performance of the proposed clustering models by increasing both the number of computing nodes used and the sample size while satisfying the volume and the velocity dimensions.
机译:海量数据可以为公司创造真正的竞争优势;它用于更好地响应客户,跟踪消费者的行为,预测演变等。但是,它有其自身的缺陷。这种数据量不仅需要很大的存储空间,而且使分析,处理和检索操作非常困难且非常耗时。解决这些问题的一种方法是将这些数据聚集为紧凑的格式,该格式仍然是整个数据的参考版本。已经提出了许多聚类算法。但是,每当数据大小变大时,它们在计算时间方面的伸缩性就很差。在本文中,我们充分利用共识聚类来处理大数据聚类。我们将采样与拆分合并策略结合使用,将数据分割成较小的子集,然后使用RHadoop的并行处理MapReduce模型从它们本地生成基本分区,然后遵循共识趋势以获得最终结果。进行可伸缩性分析以通过增加使用的计算节点数量和样本大小,同时满足体积和速度维度,来证明所提出的聚类模型的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号