Distributed Subgraph Enumeration

机译：分布式子图枚举

页面导航

摘要
著录项
相似文献
相关主题

摘要

Subgraph enumeration is a fundamental graph problem with many applications. However, existing algorithms for subgraph enumeration fall short in handling large graphs due to the computational hardness. In this work, we propose a general approach that solves subgraph enumeration in the distributed contexts, including MapReduce and Spark. The approach features a decomposition-and-join manner, in which the pattern graph is decomposed into a set of structures, called join unit. We introduce the distributed graph storage mechanism to determine what structure can be the join unit. Consequently, we obtain the results by joining the matches of all join units following a specific join structure. Based on the general approach, we first propose a star-based join framework. In the framework, we adopt a basic graph storage mechanism that only supports a star (a tree with depth 2) as the join unit, and we apply the left-deep join structure to process the join. We then show that a special star called TwinTwig - an edge or two incident edges of a node - is enough to guarantee instance optimality in the star-based join framework under reasonable assumptions, which inspires the TwinTwigJoin algorithm. We devise an A*-based algorithm to compute the optimal join plan for TwinTwigJoin. TwinTwigJoin is still not scalable to large graph because of the constraints in the left-deep join structure and that each join unit must be a star. We then explore the graph-based join framework that allows us to use more than just star as the join unit. In addition, we use the bushy join structure rather than left-deep join to guarantee the optimality of the join plan. Aware that it is storage-inefficient to use any structure as the join unit, we develop the SEED algorithm that implements an effective distributed graph storage mechanism to use star and clique (complete graph) as the join units. We then devise a dynamic-programming algorithm to compute an optimal bushy join plan. SEED frees us from the constraints in TwinTwigJoin, and greatly improves the performance of subgraph enumeration. Ultimately, we develop two data-compression techniques, namely compressed graph and clique compression, to further reduce the enormous cost while transferring and maintaining the (intermediate) results.

机译：子图枚举是许多应用程序中的基本图问题。但是，由于计算难度大，现有的子图枚举算法在处理大型图时不够。在这项工作中，我们提出了一种通用方法，可以解决包括MapReduce和Spark在内的分布式上下文中的子图枚举。该方法的特点是分解和联接方式，其中模式图被分解为一组结构，称为联接单元。我们介绍了分布式图存储机制，以确定可以作为联接单元的结构。因此，我们通过遵循特定联接结构将所有联接单元的匹配联接来获得结果。基于一般方法，我们首先提出一个基于星号的联接框架。在该框架中，我们采用了一种基本的图形存储机制，该机制仅支持星形（深度为2的树）作为连接单元，并应用左深连接结构来处理连接。然后，我们展示了一个称为TwinTwig的特殊星形-一个节点的一条边或两个入射边-足以在合理的假设下保证基于星形的连接框架中实例的最优性，这激发了TwinTwigJoin算法。我们设计了一种基于A *的算法来计算TwinTwigJoin的最佳联接计划。 TwinTwigJoin仍然无法缩放到大图，这是因为左深连接结构存在约束，并且每个连接单元必须是星形。然后，我们探索基于图的联接框架，该框架使我们不仅可以将star用作联接单元。另外，我们使用浓密的联接结构而不是左深联接，以确保联接计划的最优性。意识到使用任何结构作为连接单元都存储效率低下，我们开发了SEED算法，该算法实现了一种有效的分布式图形存储机制，以使用星形和小集团（完整图）作为连接单元。然后，我们设计一种动态编程算法来计算最佳的丛生联接计划。 SEED使我们摆脱了TwinTwigJoin中的约束，并大大提高了子图枚举的性能。最终，我们开发了两种数据压缩技术，即压缩图形和集团压缩，以进一步降低传输和维护（中间）结果的巨大成本。

著录项

作者
Lai Longbin Computer Science Engineering Faculty of Engineering UNSW;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种 en
中图分类

相似文献

外文文献
中文文献
专利

1. Enumerating Trillion Subgraphs On Distributed Systems [J] . Park Ha-Myung, Silvestri Francesco, Pagh Rasmus, ACM transactions on knowledge discovery from data . 2018,第6期

机译：枚举分布式系统中的万亿个子图
2. An Optimal Algorithm for Enumerating Connected Convex Subgraphs in Acyclic Digraphs [J] . Xiao Chenglong, Wang Shanshan, Liu Wanjun, Circuits and Systems II: Express Briefs, IEEE Transactions on . 2021,第1期

机译：一种最佳算法，用于枚举无循环数字中的连接凸子图
3. A Note on Enumeration of 3-Edge-Connected Spanning Subgraphs in Plane Graphs [J] . Yasuko MATSUI, Kenta OZEKI IEICE transactions on information and systems . 2021,第3期

机译：关于平面图中3边连接跨越子图的枚举的说明
4. BENU: Distributed Subgraph Enumeration with Backtracking-Based Framework [C] . Zhaokang Wang, Rong Gu, Weiwei Hu, IEEE International Conference on Data Engineering . 2019

机译：BENU：基于回溯的框架的分布式子图枚举
5. A Linear Delay Algorithm for Enumerating All Connected Induced Subgraphs [D] . Alokshiya, Mohammed 2018

机译：枚举所有连通诱导子图的线性延迟算法
6. The Index-Based Subgraph Matching Algorithm with General Symmetries (ISMAGS): Exploiting Symmetry for Faster Subgraph Enumeration [O] . Maarten Houbraken, Sofie Demeyer, Tom Michoel, -1

机译：具有通用对称性的基于索引的子图匹配算法（ISMAGS）：利用对称性实现更快的子图枚举
7. The Index-based Subgraph Matching Algorithm with General Symmetries (ISMAGS): exploiting symmetry for faster subgraph enumeration. [O] . Maarten Houbraken, Sofie Demeyer, Tom Michoel, 2014

机译：基于索引的通用对称子图匹配算法（IsmaGs）：利用对称性来实现更快的子图枚举。

Distributed Subgraph Enumeration

摘要

著录项

相似文献

相关主题

期刊订阅