Overlapping correlation clustering

Francesco Bonchi; Aristides Gionis; Antti Ukkonen

首页> 外文期刊>Knowledge and information systems >Overlapping correlation clustering

【24h】

Overlapping correlation clustering

机译：重叠相关聚类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We introduce a new approach for finding overlapping clusters given pairwise similarities of objects. In particular, we relax the problem of correlation clustering by allowing an object to be assigned to more than one cluster. At the core of our approach is an optimization problem in which each data point is mapped to a small set of labels, representing membership in different clusters. The objective is to find a mapping so that the given similarities between objects agree as much as possible with similarities taken over their label sets. The number of labels can vary across objects. To define a similarity between label sets, we consider two measures: (i) a 0-1 function indicating whether the two label sets have nonzero intersection and (ii) the Jaccard coefficient between the two label sets. The algorithm we propose is an iterative local-search method. The definitions of label set similarity give rise to two non-trivial optimization problems, which, for the measures of set-intersection and Jaccard, we solve using a greedy strategy and non-negative least squares, respectively. We also develop a distributed version of our algorithm based on the BSP model and implement it using a Pregel framework. Our algorithm uses as input pairwise similarities of objects and can thus be applied when clustering structured objects for which feature vectors are not available. As a proof of concept, we apply our algorithms on three different and complex application domains: trajectories, amino-acid sequences, and textual documents.

机译：我们介绍了一种新方法，该方法可在给定对象的成对相似性的情况下找到重叠的簇。特别是，我们通过允许将一个对象分配给多个群集来缓解相关群集的问题。我们方法的核心是一个优化问题，其中每个数据点都映射到一小组标签，代表不同集群中的成员身份。目的是找到一种映射，以使对象之间的给定相似性与它们的标签集所获得的相似性尽可能一致。标签的数量可能会因对象而异。为了定义标签集之间的相似性，我们考虑两种方法：（i）0-1函数，指示两个标签集是否具有非零交集;（ii）两个标签集之间的雅卡系数。我们提出的算法是一种迭代的局部搜索方法。标签集相似性的定义引起了两个非平凡的优化问题，对于集交和Jaccard的度量，我们分别使用贪婪策略和非负最小二乘法进行求解。我们还基于BSP模型开发了算法的分布式版本，并使用Pregel框架实现了该算法。我们的算法将对象的成对相似性用作输入，因此可以在对特征向量不可用的结构化对象进行聚类时应用。作为概念证明，我们将算法应用于三个不同且复杂的应用域：轨迹，氨基酸序列和文本文档。

著录项

来源
《Knowledge and information systems》 |2013年第1期|共32页
作者
Francesco Bonchi; Aristides Gionis; Antti Ukkonen;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化系统理论;
关键词
Algorithms; Clustering; Overlapping clustering; Correlation clustering; Pregel;

机译：算法;聚类;重叠聚类;相关聚类;Pregel;

相似文献

外文文献
中文文献
专利

1. Clusters in weighted macroeconomic networks: the EU case. Introducing the overlapping index of GDP/capita fluctuation correlations [J] . Gligor M, Ausloos M The European physical journal, B. Condensed matter physics . 2008,第4期

机译：加权宏观经济网络中的集群：欧盟案例。介绍GDP /人均波动相关性的重叠指数
2. Node correlation clustering algorithm for wireless multimedia sensor networks based on overlapped FoVs [J] . ZHANG Qian-yan, WANG Ru-chuan, SHA Chao, 中国邮电高校学报（英文版） . 2013,第005期

机译：基于重叠FoV的无线多媒体传感器网络节点相关性聚类算法
3. Estimating Protein Functions Correlation Based on Overlapping Proteins and Cluster Interactions [J] . Khaled S. Ahmed American Journal of Bioinformatics Research . 2011,第1期

机译：基于重叠蛋白质和簇相互作用评估蛋白质功能相关性
4. An Overlapping Clustering Approach with Correlation Weight [C] . Yingge Xu, Yan Yang, Hongjun Wang, International joint conference on rough sets . 2017

机译：具有相关权重的重叠聚类方法
5. Overlapping codon model, phylogenetic clustering, and alternative partial expectation conditional maximization algorithm. [D] . Chen, Wei-Chen. 2011

机译：重叠密码子模型，系统发生聚类和替代的部分期望条件最大化算法。
6. The role of cluster size and intra-cluster correlations when adjusting for covariates in the analysis of cluster randomised trials [O] . Neil Wright 2015

机译：调整协变量时聚类大小和聚类内相关性在聚类随机试验分析中的作用
7. Clusters in weighted macroeconomic networks: the EU case. Introducing the overlapping index of GDP/capita fluctuation correlations [O] . M. Gligor, M. Ausloos 2008

机译：加权宏观经济网络中的群集：欧盟案例。介绍GDP / CAPITA波动相关性的重叠指数
8. The Richness Dependence of Galaxy Cluster Correlations: Results From A Redshift Survey Of Rich APM Clusters [R] . Croft, R. A. C., Dalton, G. B., Efstathiou, G., 1997

机译：银河聚类相关性的丰富依赖性：来自富apm聚类的红移调查结果

Overlapping correlation clustering

摘要

著录项

相似文献

相关主题

期刊订阅