首页> 外文OA文献 >Distributed Subgraph Enumeration
【2h】

Distributed Subgraph Enumeration

机译:分布式子图枚举

摘要

Subgraph enumeration is a fundamental graph problem with many applications. However, existing algorithms for subgraph enumeration fall short in handling large graphs due to the computational hardness. In this work, we propose a general approach that solves subgraph enumeration in the distributed contexts, including MapReduce and Spark. The approach features a decomposition-and-join manner, in which the pattern graph is decomposed into a set of structures, called join unit. We introduce the distributed graph storage mechanism to determine what structure can be the join unit. Consequently, we obtain the results by joining the matches of all join units following a specific join structure. Based on the general approach, we first propose a star-based join framework. In the framework, we adopt a basic graph storage mechanism that only supports a star (a tree with depth 2) as the join unit, and we apply the left-deep join structure to process the join. We then show that a special star called TwinTwig - an edge or two incident edges of a node - is enough to guarantee instance optimality in the star-based join framework under reasonable assumptions, which inspires the TwinTwigJoin algorithm. We devise an A*-based algorithm to compute the optimal join plan for TwinTwigJoin. TwinTwigJoin is still not scalable to large graph because of the constraints in the left-deep join structure and that each join unit must be a star. We then explore the graph-based join framework that allows us to use more than just star as the join unit. In addition, we use the bushy join structure rather than left-deep join to guarantee the optimality of the join plan. Aware that it is storage-inefficient to use any structure as the join unit, we develop the SEED algorithm that implements an effective distributed graph storage mechanism to use star and clique (complete graph) as the join units. We then devise a dynamic-programming algorithm to compute an optimal bushy join plan. SEED frees us from the constraints in TwinTwigJoin, and greatly improves the performance of subgraph enumeration. Ultimately, we develop two data-compression techniques, namely compressed graph and clique compression, to further reduce the enormous cost while transferring and maintaining the (intermediate) results.
机译:子图枚举是许多应用程序中的基本图问题。但是,由于计算难度大,现有的子图枚举算法在处理大型图时不够。在这项工作中,我们提出了一种通用方法,可以解决包括MapReduce和Spark在内的分布式上下文中的子图枚举。该方法的特点是分解和联接方式,其中模式图被分解为一组结构,称为联接单元。我们介绍了分布式图存储机制,以确定可以作为联接单元的结构。因此,我们通过遵循特定联接结构将所有联接单元的匹配联接来获得结果。基于一般方法,我们首先提出一个基于星号的联接框架。在该框架中,我们采用了一种基本的图形存储机制,该机制仅支持星形(深度为2的树)作为连接单元,并应用左深连接结构来处理连接。然后,我们展示了一个称为TwinTwig的特殊星形-一个节点的一条边或两个入射边-足以在合理的假设下保证基于星形的连接框架中实例的最优性,这激发了TwinTwigJoin算法。我们设计了一种基于A *的算法来计算TwinTwigJoin的最佳联接计划。 TwinTwigJoin仍然无法缩放到大图,这是因为左深连接结构存在约束,并且每个连接单元必须是星形。然后,我们探索基于图的联接框架,该框架使我们不仅可以将star用作联接单元。另外,我们使用浓密的联接结构而不是左深联接,以确保联接计划的最优性。意识到使用任何结构作为连接单元都存储效率低下,我们开发了SEED算法,该算法实现了一种有效的分布式图形存储机制,以使用星形和小集团(完整图)作为连接单元。然后,我们设计一种动态编程算法来计算最佳的丛生联接计划。 SEED使我们摆脱了TwinTwigJoin中的约束,并大大提高了子图枚举的性能。最终,我们开发了两种数据压缩技术,即压缩图形和集团压缩,以进一步降低传输和维护(中间)结果的巨大成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号