Speeding up the large-scale consensus fuzzy clustering for handling Big Data

Minyar Sassi Hidri; Mohamed Ali Zoghlami; Rahma Ben Ayed

首页> 外文期刊>Fuzzy sets and systems >Speeding up the large-scale consensus fuzzy clustering for handling Big Data

【24h】

Speeding up the large-scale consensus fuzzy clustering for handling Big Data

机译：加快大规模共识模糊聚类处理大数据的速度

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Massive data can create a real competitive advantage for the companies; it is used to better respond to customers, to follow the behavior of consumers, to anticipate the evolutions, etc. However, it has its own deficiencies. This data volume not only requires big storage spaces but also makes analysis, processing and retrieval operations very difficult and hugely time-consuming. One way to overcome these problems is to cluster this data into a compact format that is still an informative version of the entire data. A lot of clustering algorithms have been proposed. However, their scaling is poor in terms of computation time whenever the size of the data gets larger. In this paper, we make full use of consensus clustering to handle Big Data clustering. We use sampling combined with a split-and-merge strategy to fragment data into small subsets, then basic partitions are locally generated from them using RHadoop's parallel processing MapReduce model and later a consensus tendency is followed to obtain the final result. A scalability analysis is conducted to demonstrate the performance of the proposed clustering models by increasing both the number of computing nodes used and the sample size while satisfying the volume and the velocity dimensions.

机译：海量数据可以为公司创造真正的竞争优势；它用于更好地响应客户，跟踪消费者的行为，预测演变等。但是，它有其自身的缺陷。这种数据量不仅需要很大的存储空间，而且使分析，处理和检索操作非常困难且非常耗时。解决这些问题的一种方法是将这些数据聚集为紧凑的格式，该格式仍然是整个数据的参考版本。已经提出了许多聚类算法。但是，每当数据大小变大时，它们在计算时间方面的伸缩性就很差。在本文中，我们充分利用共识聚类来处理大数据聚类。我们将采样与拆分合并策略结合使用，将数据分割成较小的子集，然后使用RHadoop的并行处理MapReduce模型从它们本地生成基本分区，然后遵循共识趋势以获得最终结果。进行可伸缩性分析以通过增加使用的计算节点数量和样本大小，同时满足体积和速度维度，来证明所提出的聚类模型的性能。

著录项

来源
《Fuzzy sets and systems》 |2018年第1期|50-74|共25页
作者
Minyar Sassi Hidri; Mohamed Ali Zoghlami; Rahma Ben Ayed;
展开▼
作者单位

University of Tunis El Manar, National Engineering School of Tunis;

Imam Abdulrahman Bin Faisal University;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Big Data analytics; Consensus tendency; Fuzzy clustering; Partial data clustering; Sampling; MapReduce; RHadoop;

机译：大数据分析;共识趋势;模糊聚类;部分数据聚类;抽样;MapReduce;RHadoop;

相似文献

外文文献
中文文献
专利

1. Speeding up the Consensus Clustering methodology for microarray data analysis [J] . Raffaele Giancarlo, Filippo Utro Algorithms for Molecular Biology . 2011,第1期

机译：加快用于微阵列数据分析的共识聚类方法
2. A consensus model for large-scale group decision making with hesitant fuzzy information and changeable clusters [J] . Zhibin Wu, Jiuping Xu Information Fusion . 2018,第期

机译：大规模组决策的共识模型，具有犹豫不决的模糊信息和多变簇
3. Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data [J] . Yu Zhiwen, Chen Hantao, You Jane, Computational Biology and Bioinformatics, IEEE/ACM Transactions on . 2015,第4期

机译：癌症数据聚类分析的自适应模糊共识聚类框架
4. Consensus Clustering for Cancer Gene Expression Data: Large-Scale Analysis using Evidence Accumulation Approach [C] . Isidora Sasic, Sanja Brdar, Tatjana Loncar-Turukalo, International Conference on Bioinformatics Models, Methods and Algorithms . 2017

机译：癌症基因表达数据的共识聚类：使用证据积累方法进行大规模分析
5. A new hierarchical clustering model for speeding up the reconciliation of XML-based, semistructured data in mediation systems. [D] . Pluempitiwiriyawej, Charnyote. 2001

机译：一种新的层次集群模型，用于加快中介系统中基于XML的半结构化数据的协调。
6. Speeding up the Consensus Clustering methodology for microarray data analysis [O] . Raffaele Giancarlo, Filippo Utro 2011

机译：加快用于微阵列数据分析的共识聚类方法
7. Speeding up the Consensus Clustering methodology for microarray data analysis [O] . Giancarlo, R, Utro, F 2011

机译：加快用于微阵列数据分析的共识聚类方法

Speeding up the large-scale consensus fuzzy clustering for handling Big Data

摘要

著录项

相似文献

相关主题

期刊订阅