首页> 外文会议>IEEE International Congress on Big Data >A fast parallel algorithm for counting triangles in graphs using dynamic load balancing
【24h】

A fast parallel algorithm for counting triangles in graphs using dynamic load balancing

机译:使用动态负载平衡的图形中三角形计数的快速并行算法

获取原文

摘要

Finding the number of triangles in a graph (network) is an important problem in graph analysis. The number of triangles also has important applications in graph mining. Big graphs emerging from numerous application areas pose a significant challenge for the analysis and mining since these graphs consist of millions, or even billions, of nodes and edges. Graphs of such scale necessitate the development of efficient parallel algorithms. Existing distributed memory parallel algorithms for counting exact triangles are either Map-Reduce or message passing interface (MPI) based. Map-Reduce based algorithms generate prohibitively large intermediate data and do not demonstrate reasonably good runtime efficiency. The MPI based algorithms offer fast computation of the number of triangles. However, the partitioning and load balancing schemes these algorithms employ are static in nature - the partitions are precomputed based on some estimations. In this paper, we present an efficient MPI-based parallel algorithm for counting triangles in large graph. We consider the case where the main memory of each compute node is large enough to contain the entire graph. We observe that for such a case, computation load can be balanced dynamically and present a dynamic load balancing scheme which improves the performance of the algorithm significantly. Our algorithm demonstrates very good speedups and scales to a large number of processors. The algorithm computes the exact number of triangles in a network with 1 billion edges in 2 minutes with only 100 processors. Our results demonstrate that the algorithm is significantly faster than the related algorithms with static partitioning. In fact, for the real-world networks we experimented on, our algorithm achieves at least 2 times runtime efficiency over the fastest algorithm with static load balancing.
机译:在图表(网络)中找到三角形的数量是图分析中的一个重要问题。三角形的数量也在图形挖掘中具有重要应用。从许多应用领域出现的大图表对分析和采矿构成了重大挑战,因为这些图表由节点和边缘的数百万或甚至数十亿组成。这种规模的图需要开发有效的并行算法。用于计数精确三角形的现有分布式存储器并行算法是基于MAG-REANG-REANGY或消息传递接口(MPI)。 Map-Dealsul基础的算法生成过大的中间数据,并且不会展示合理的运行时间效率。基于MPI的算法提供了三角形数量的快速计算。但是,分区和负载均衡方案这些算法在自然界中是静态的 - 基于一些估计来预先计算分区。在本文中,我们介绍了一种基于MPI的基于MPI的并行算法,用于计算大图中的三角形。我们考虑每个计算节点的主存储器足以包含整个图形的情况。我们观察到,对于这种情况,可以动态地平衡计算负载并呈现动态负载平衡方案,其显着提高了算法的性能。我们的算法演示了大量处理器的非常好的加速和缩放。该算法在2分钟内使用10亿个边缘计算网络中的三角形的确切数量,只有100个处理器。我们的结果表明,该算法明显比具有静态分区的相关算法的速度更快。实际上,对于我们进行实验的真实网络,我们的算法通过静态负载平衡的最快算法实现至少2倍的运行时效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号