Scalable Distributed Subgraph Enumeration

机译：可扩展的分布式子图枚举

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Subgraph enumeration aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph. As the subgraph isomorphism operation is computationally intensive, researchers have recently focused on solving this problem in distributed environments, such as MapReduce and Pregel. Among them, the state-of-the-art algorithm, TwinTwigJoin, is proven to be instance optimal based on a left-deep join framework. However, it is still not scalable to large graphs because of the constraints in the left-deep join framework and that each decomposed component (join unit) must be a star. In this paper, we propose SEED - a scalable subgraph enumeration approach in the distributed environment. Compared to TwinTwigJoin, SEED returns optimal solution in a generalized join framework without the constraints in TwinTwigJoin. We use both star and clique as the join units, and design an effective distributed graph storage mechanism to support such an extension. We develop a comprehensive cost model, that estimates the number of matches of any given pattern graph by considering power-law degree distribution in the data graph. We then generalize the left-deep join framework and develop a dynamic-programming algorithm to compute an optimal bushy join plan. We also consider overlaps among the join units. Finally, we propose clique compression to further improve the algorithm by reducing the number of the intermediate results. Extensive performance studies are conducted on several real graphs, one containing billions of edges. The results demonstrate that our algorithm outperforms all other state-of-the-art algorithms by more than one order of magnitude.

机译：子图枚举旨在找到对给定模式图的大数据图的所有子图。随着Subograph同构操作的计算密集，研究人员最近专注于在分布式环境中解决这个问题，例如MapReduce和Pregel。其中，证明了基于左深加入框架的实例最佳的最先进的算法Twintwigjoin。但是，由于左深加入框架中的约束，并且每个分解组件（连接单元）必须是明星，因此仍然不可扩展到大图。在本文中，我们提出了分布式环境中的可扩展子图枚举方法。与TwintWigjoin相比，SEED返回在广义连接框架中的最佳解决方案，而无需TWINTWIGJOIN的约束。我们使用星和集团作为加入单元，设计有效的分布式图形存储机制来支持这种扩展。我们开发了一个综合成本模型，通过考虑数据图中的幂律程度分布来估计任何给定模式图的匹配数。然后，我们概括了左深加入框架并开发了一种动态编程算法来计算最佳的浓密连接计划。我们还考虑连接单元之间的重叠。最后，我们提出了通过减少中间结果的数量来进一步改进算法的Clique压缩。在几个真实图中进行了广泛的性能研究，其中一个包含数十亿边缘。结果表明，我们的算法优于所有其他最先进的算法超过一种数量级。

著录项

来源
《International conference on very large data bases》|2017年|696 p.|共12页
会议地点
作者
Longbin Lai; Lu Qin; Xuemin Lin; Ying Zhang; Lijun Chang; Shiyu Yang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词

相似文献

外文文献
中文文献
专利

1. Enumerating Trillion Subgraphs On Distributed Systems [J] . Park Ha-Myung, Silvestri Francesco, Pagh Rasmus, ACM transactions on knowledge discovery from data . 2018,第6期

机译：枚举分布式系统中的万亿个子图
2. Scalable subgraph enumeration in MapReduce: a cost-oriented approach [J] . Lai Longbin, Qin Lu, Lin Xuemin, The VLDB journal . 2017,第3期

机译：MapReduce中的可伸缩子图枚举：一种面向成本的方法
3. Scalable enumeration approach for maximizing hosting capacity of distributed generation [J] . Takenobu Yuji, Yasuda Norihito, Minato Shin-ichi, International Journal of Electrical Power & Energy Systems . 2019,第FEBa期

机译：可扩展的枚举方法，可最大化分布式发电的托管容量
4. Scalable Distributed Subgraph Enumeration [C] . Longbin Lai, Lu Qin, Xuemin Lin, International conference on very large data bases . 2017

机译：可伸缩的分布式子图枚举
5. A Linear Delay Algorithm for Enumerating All Connected Induced Subgraphs [D] . Alokshiya, Mohammed 2018

机译：枚举所有连通诱导子图的线性延迟算法
6. The Index-Based Subgraph Matching Algorithm with General Symmetries (ISMAGS): Exploiting Symmetry for Faster Subgraph Enumeration [O] . Maarten Houbraken, Sofie Demeyer, Tom Michoel, -1

机译：具有通用对称性的基于索引的子图匹配算法（ISMAGS）：利用对称性实现更快的子图枚举
7. Distributed Subgraph Enumeration [O] . Lai Longbin Computer Science Engineering Faculty of Engineering UNSW 2017

机译：分布式子图枚举
8. GREWA Scalable Frequent Subgraph Discovery Algorithm. [R] . Kuramochi, M., Karypis, G. 2004

机译：GREWa可扩展频繁子图发现算法。

Scalable Distributed Subgraph Enumeration

摘要

著录项

相似文献

相关主题

期刊订阅