Adaptive Scaling of Cluster Boundaries for Large-Scale Social Media Data Clustering

Lei Meng; Ah-Hwee Tan; Donald C. Wunsch

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Adaptive Scaling of Cluster Boundaries for Large-Scale Social Media Data Clustering

【24h】

Adaptive Scaling of Cluster Boundaries for Large-Scale Social Media Data Clustering

机译：大规模社交媒体数据聚类的聚类边界自适应缩放

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The large scale and complex nature of social media data raises the need to scale clustering techniques to big data and make them capable of automatically identifying data clusters with few empirical settings. In this paper, we present our investigation and three algorithms based on the fuzzy adaptive resonance theory (Fuzzy ART) that have linear computational complexity, use a single parameter, i.e., the vigilance parameter to identify data clusters, and are robust to modest parameter settings. The contribution of this paper lies in two aspects. First, we theoretically demonstrate how complement coding, commonly known as a normalization method, changes the clustering mechanism of Fuzzy ART, and discover the vigilance region (VR) that essentially determines how a cluster in the Fuzzy ART system recognizes similar patterns in the feature space. The VR gives an intrinsic interpretation of the clustering mechanism and limitations of Fuzzy ART. Second, we introduce the idea of allowing different clusters in the Fuzzy ART system to have different vigilance levels in order to meet the diverse nature of the pattern distribution of social media data. To this end, we propose three vigilance adaptation methods, namely, the activation maximization (AM) rule, the confliction minimization (CM) rule, and the hybrid integration (HI) rule. With an initial vigilance value, the resulting clustering algorithms, namely, the AM-ART, CM-ART, and HI-ART, can automatically adapt the vigilance values of all clusters during the learning epochs in order to produce better cluster boundaries. Experiments on four social media data sets show that AM-ART, CM-ART, and HI-ART are more robust than Fuzzy ART to the initial vigilance value, and they usually achieve better or comparable performance and much faster speed than the state-of-the-art clustering algorithms that also do not require a predefined number of clusters.

机译：社交媒体数据的大规模和复杂性导致需要将聚类技术扩展到大数据，并使其能够以很少的经验设置自动识别数据聚类。在本文中，我们介绍了我们的研究以及基于模糊自适应共振理论（Fuzzy ART）的三种算法，它们具有线性计算复杂性，使用单个参数（即警惕性参数）来识别数据集群，并且对适度的参数设置具有鲁棒性。本文的贡献在于两个方面。首先，我们从理论上证明通常称为归一化方法的补码如何改变Fuzzy ART的聚类机制，并发现警戒区域（VR），该警戒区域本质上决定Fuzzy ART系统中的聚类如何识别特征空间中的相似模式。 VR对聚类机制和模糊ART的局限性做出了内在的解释。其次，我们引入了允许Fuzzy ART系统中的不同聚类具有不同警惕级别的想法，以满足社交媒体数据模式分布的多样性。为此，我们提出了三种警惕适应方法，即激活最大化（AM）规则，冲突最小化（CM）规则和混合积分（HI）规则。利用初始警戒值，所得的聚类算法（即AM-ART，CM-ART和HI-ART）可以在学习时期自动调整所有聚类的警戒值，以产生更好的聚类边界。在四个社交媒体数据集上进行的实验表明，在初始警戒值方面，AM-ART，CM-ART和HI-ART比Fuzzy ART更为健壮，并且与状态相比，它们通常可以达到更好或相当的性能，并且速度要快得多。不需要预定义数量的群集的最新群集算法。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on》 |2016年第12期|2656-2669|共14页
作者
Lei Meng; Ah-Hwee Tan; Donald C. Wunsch;
展开▼
作者单位

School of Computer Engineering, Nanyang Technological University, Singapore;

School of Computer Engineering, Nanyang Technological University, Singapore;

Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Subspace constraints; Clustering algorithms; Media; Robustness; Genetics; Encoding; Pattern recognition;

机译：子空间约束;聚类算法;媒体;鲁棒性;遗传学;编码;模式识别;

相似文献

外文文献
中文文献
专利

1. Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods: evaluation by large-scale real data [J] . Takitoh S, Fujii S, Mase Y, Bioinformatics . 2007,第4期

机译：结合聚类方法对单核苷酸多态性基因分型的二维数据进行准确的自动聚类：通过大规模真实数据进行评估
2. Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods: evaluation by large-scale real data [J] . Shuichi Takitoh, Shogo Fujii, Yoichi Mase, Bioinformatics . 2007,第4期

机译：结合聚类方法对单核苷酸多态性基因分型的二维数据进行准确的自动聚类：通过大规模真实数据进行评估
3. Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods: evaluation by large-scale real data [J] . Shuichi Takitoh14 Shogo Fujii14 Yoichi Mase14 Junichi Takasaki1 Toshimasa Yamazaki1 Yozo Ohnishi25 Masao Yanagisawa4 Yusuke Nakamura35 and Naoyuki Kamatani16 Bioinformatics . 2007,第4期

机译：结合聚类方法对单核苷酸多态性基因分型的二维数据进行准确的自动聚类：通过大规模真实数据进行评估
4. A novel visual analytics approach for clustering large-scale social data [C] . Wang Zhangye, Chen Chang, Zhou Juanxia, 2013 IEEE International Conference on Big Data . 2013

机译：一种新颖的可视化分析方法，用于对大型社交数据进行聚类
5. Scalable and robust clustering and visualization for large-scale bioinformatics data. [D] . Ruan, Yang. 2014

机译：用于大规模生物信息学数据的可扩展且强大的聚类和可视化。
6. fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data [O] . Ling-Hong Hung, Ram Samudrala -1

机译：fast_protein_cluster：大规模蛋白质建模数据的并行和优化聚类
7. Boundary-Forest Clustering: Large-Scale Consensus Clustering of Biological Sequences [O] . Defne Surujonu, José Bento, Tim van Opijnen 2020

机译：边界林聚类：生物序列的大规模共识聚类

Adaptive Scaling of Cluster Boundaries for Large-Scale Social Media Data Clustering

摘要

著录项

相似文献

相关主题

期刊订阅