首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Adaptive Scaling of Cluster Boundaries for Large-Scale Social Media Data Clustering
【24h】

Adaptive Scaling of Cluster Boundaries for Large-Scale Social Media Data Clustering

机译:大规模社交媒体数据聚类的聚类边界自适应缩放

获取原文
获取原文并翻译 | 示例
           

摘要

The large scale and complex nature of social media data raises the need to scale clustering techniques to big data and make them capable of automatically identifying data clusters with few empirical settings. In this paper, we present our investigation and three algorithms based on the fuzzy adaptive resonance theory (Fuzzy ART) that have linear computational complexity, use a single parameter, i.e., the vigilance parameter to identify data clusters, and are robust to modest parameter settings. The contribution of this paper lies in two aspects. First, we theoretically demonstrate how complement coding, commonly known as a normalization method, changes the clustering mechanism of Fuzzy ART, and discover the vigilance region (VR) that essentially determines how a cluster in the Fuzzy ART system recognizes similar patterns in the feature space. The VR gives an intrinsic interpretation of the clustering mechanism and limitations of Fuzzy ART. Second, we introduce the idea of allowing different clusters in the Fuzzy ART system to have different vigilance levels in order to meet the diverse nature of the pattern distribution of social media data. To this end, we propose three vigilance adaptation methods, namely, the activation maximization (AM) rule, the confliction minimization (CM) rule, and the hybrid integration (HI) rule. With an initial vigilance value, the resulting clustering algorithms, namely, the AM-ART, CM-ART, and HI-ART, can automatically adapt the vigilance values of all clusters during the learning epochs in order to produce better cluster boundaries. Experiments on four social media data sets show that AM-ART, CM-ART, and HI-ART are more robust than Fuzzy ART to the initial vigilance value, and they usually achieve better or comparable performance and much faster speed than the state-of-the-art clustering algorithms that also do not require a predefined number of clusters.
机译:社交媒体数据的大规模和复杂性导致需要将聚类技术扩展到大数据,并使其能够以很少的经验设置自动识别数据聚类。在本文中,我们介绍了我们的研究以及基于模糊自适应共振理论(Fuzzy ART)的三种算法,它们具有线性计算复杂性,使用单个参数(即警惕性参数)来识别数据集群,并且对适度的参数设置具有鲁棒性。本文的贡献在于两个方面。首先,我们从理论上证明通常称为归一化方法的补码如何改变Fuzzy ART的聚类机制,并发现警戒区域(VR),该警戒区域本质上决定Fuzzy ART系统中的聚类如何识别特征空间中的相似模式。 VR对聚类机制和模糊ART的局限性做出了内在的解释。其次,我们引入了允许Fuzzy ART系统中的不同聚类具有不同警惕级别的想法,以满足社交媒体数据模式分布的多样性。为此,我们提出了三种警惕适应方法,即激活最大化(AM)规则,冲突最小化(CM)规则和混合积分(HI)规则。利用初始警戒值,所得的聚类算法(即AM-ART,CM-ART和HI-ART)可以在学习时期自动调整所有聚类的警戒值,以产生更好的聚类边界。在四个社交媒体数据集上进行的实验表明,在初始警戒值方面,AM-ART,CM-ART和HI-ART比Fuzzy ART更为健壮,并且与状态相比,它们通常可以达到更好或相当的性能,并且速度要快得多。不需要预定义数量的群集的最新群集算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号