首页> 外文会议>International conference on very large data bases >Scalable Distributed Subgraph Enumeration
【24h】

Scalable Distributed Subgraph Enumeration

机译:可扩展的分布式子图枚举

获取原文

摘要

Subgraph enumeration aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph. As the subgraph isomorphism operation is computationally intensive, researchers have recently focused on solving this problem in distributed environments, such as MapReduce and Pregel. Among them, the state-of-the-art algorithm, TwinTwigJoin, is proven to be instance optimal based on a left-deep join framework. However, it is still not scalable to large graphs because of the constraints in the left-deep join framework and that each decomposed component (join unit) must be a star. In this paper, we propose SEED - a scalable subgraph enumeration approach in the distributed environment. Compared to TwinTwigJoin, SEED returns optimal solution in a generalized join framework without the constraints in TwinTwigJoin. We use both star and clique as the join units, and design an effective distributed graph storage mechanism to support such an extension. We develop a comprehensive cost model, that estimates the number of matches of any given pattern graph by considering power-law degree distribution in the data graph. We then generalize the left-deep join framework and develop a dynamic-programming algorithm to compute an optimal bushy join plan. We also consider overlaps among the join units. Finally, we propose clique compression to further improve the algorithm by reducing the number of the intermediate results. Extensive performance studies are conducted on several real graphs, one containing billions of edges. The results demonstrate that our algorithm outperforms all other state-of-the-art algorithms by more than one order of magnitude.
机译:子图枚举旨在找到对给定模式图的大数据图的所有子图。随着Subograph同构操作的计算密集,研究人员最近专注于在分布式环境中解决这个问题,例如MapReduce和Pregel。其中,证明了基于左深加入框架的实例最佳的最先进的算法Twintwigjoin。但是,由于左深加入框架中的约束,并且每个分解组件(连接单元)必须是明星,因此仍然不可扩展到大图。在本文中,我们提出了分布式环境中的可扩展子图枚举方法。与TwintWigjoin相比,SEED返回在广义连接框架中的最佳解决方案,而无需TWINTWIGJOIN的约束。我们使用星和集团作为加入单元,设计有效的分布式图形存储机制来支持这种扩展。我们开发了一个综合成本模型,通过考虑数据图中的幂律程度分布来估计任何给定模式图的匹配数。然后,我们概括了左深加入框架并开发了一种动态编程算法来计算最佳的浓密连接计划。我们还考虑连接单元之间的重叠。最后,我们提出了通过减少中间结果的数量来进一步改进算法的Clique压缩。在几个真实图中进行了广泛的性能研究,其中一个包含数十亿边缘。结果表明,我们的算法优于所有其他最先进的算法超过一种数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号