Scalable Distributed Subgraph Enumeration

机译：可伸缩的分布式子图枚举

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Subgraph enumeration aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph. As the subgraph isomorphism operation is computationally intensive, researchers have recently focused on solving this problem in distributed environments, such as MapReduce and Pregel. Among them, the state-of-the-art algorithm, TwinTwigJoin, is proven to be instance optimal based on a left-deep join framework. However, it is still not scalable to large graphs because of the constraints in the left-deep join framework and that each decomposed component (join unit) must be a star. In this paper, we propose SEED - a scalable subgraph enumeration approach in the distributed environment. Compared to TwinTwigJoin, SEED returns optimal solution in a generalized join framework without the constraints in TwinTwigJoin. We use both star and clique as the join units, and design an effective distributed graph storage mechanism to support such an extension. We develop a comprehensive cost model, that estimates the number of matches of any given pattern graph by considering power-law degree distribution in the data graph. We then generalize the left-deep join framework and develop a dynamic-programming algorithm to compute an optimal bushy join plan. We also consider overlaps among the join units. Finally, we propose clique compression to further improve the algorithm by reducing the number of the intermediate results. Extensive performance studies are conducted on several real graphs, one containing billions of edges. The results demonstrate that our algorithm outperforms all other state-of-the-art algorithms by more than one order of magnitude.

机译：子图枚举旨在查找与给定模式图同构的大型数据图的所有子图。由于子图同构运算的计算量很大，因此研究人员最近将重点放在解决分布式环境（例如MapReduce和Pregel）中的此问题上。其中，最先进的算法TwinTwigJoin被证明是基于左深连接框架的实例最佳实例。但是，由于左深连接框架中的约束，并且每个分解的组件（连接单元）必须是星形，因此它仍然无法缩放到大图。在本文中，我们提出了SEED-分布式环境中的可伸缩子图枚举方法。与TwinTwigJoin相比，SEED在不受TwinTwigJoin约束的通用联接框架中返回最佳解决方案。我们使用star和clique作为连接单元，并设计一种有效的分布式图形存储机制来支持这种扩展。我们开发了一个综合成本模型，该模型通过考虑数据图中的幂律度分布来估计任何给定模式图的匹配数。然后，我们对左深连接框架进行泛化，并开发一种动态编程算法来计算最佳的丛集连接计划。我们还考虑了联接单元之间的重叠。最后，我们提出集团压缩以通过减少中间结果的数量来进一步改进算法。在几张真实的图上进行了广泛的性能研究，其中一张包含数十亿条边。结果表明，我们的算法比所有其他现有技术算法高出一个数量级。

著录项

来源
《International conference on very large data bases》|2017年|217-228|共12页
会议地点
作者
Longbin Lai; Lu Qin; Xuemin Lin; Ying Zhang; Lijun Chang; Shiyu Yang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Enumerating Trillion Subgraphs On Distributed Systems [J] . Park Ha-Myung, Silvestri Francesco, Pagh Rasmus, ACM transactions on knowledge discovery from data . 2018,第6期

机译：枚举分布式系统中的万亿个子图
2. Scalable subgraph enumeration in MapReduce: a cost-oriented approach [J] . Lai Longbin, Qin Lu, Lin Xuemin, The VLDB journal . 2017,第3期

机译：MapReduce中的可伸缩子图枚举：一种面向成本的方法
3. Scalable enumeration approach for maximizing hosting capacity of distributed generation [J] . Takenobu Yuji, Yasuda Norihito, Minato Shin-ichi, International Journal of Electrical Power & Energy Systems . 2019,第FEBa期

机译：可扩展的枚举方法，可最大化分布式发电的托管容量
4. Scalable Distributed Subgraph Enumeration [C] . Longbin Lai, Lu Qin, Xuemin Lin, International conference on very large data bases . 2017

机译：可扩展的分布式子图枚举
5. A Linear Delay Algorithm for Enumerating All Connected Induced Subgraphs [D] . Alokshiya, Mohammed 2018

机译：枚举所有连通诱导子图的线性延迟算法
6. The Index-Based Subgraph Matching Algorithm with General Symmetries (ISMAGS): Exploiting Symmetry for Faster Subgraph Enumeration [O] . Maarten Houbraken, Sofie Demeyer, Tom Michoel, -1

机译：具有通用对称性的基于索引的子图匹配算法（ISMAGS）：利用对称性实现更快的子图枚举
7. Distributed Subgraph Enumeration [O] . Lai Longbin Computer Science Engineering Faculty of Engineering UNSW 2017

机译：分布式子图枚举
8. GREWA Scalable Frequent Subgraph Discovery Algorithm. [R] . Kuramochi, M., Karypis, G. 2004

机译：GREWa可扩展频繁子图发现算法。

Scalable Distributed Subgraph Enumeration

摘要

著录项

相似文献

相关主题

期刊订阅