首页> 外文学位 >Irregular graph algorithms on modern multicore, manycore, and distributed processing systems.
【24h】

Irregular graph algorithms on modern multicore, manycore, and distributed processing systems.

机译:现代多核,众核和分布式处理系统上的不规则图算法。

获取原文
获取原文并翻译 | 示例

摘要

Graph analysis is the study of real-world interaction data, be it through biological or chemical interaction networks, human social or communication networks, or other graph-representable datasets pervasive throughout the social and physical sciences. Due to increasing data sizes and complexities, it is important to develop efficient and scalable approaches for the algorithms, tools, and techniques used to study such data. Efficient utilization of the increasing heterogeneity and complexity of modern high performance computing systems is another major consideration for these efforts.;The primary contributions of this thesis are as follows: First, parallel and scalable solutions to several basic graph analytics are presented. An implementation of the color-coding algorithm for subgraph isomorphism is introduced as FFASCIA (Fast Approximate Subgraph Counting and Enumeration). Using several optimizations for work avoidance, memory usage reduction, and cache/data movement efficiency, FASCIA demonstrates up to a five orders-of-magnitude per-core speedup relative to prior art. FASCIA is able to calculate the counts of subgraphs up to 10 vertices on multi-billion edge graphs in minutes on a modest 16 node cluster and use these counts for a variety of analytics. Using FASCIA 's baseline approach, FASTPATH is also introduced to find minimum weight paths in weighted networks. The MULTISTEP method is next introduced as an approach for graph connectivity, weak connectivity, and strong connectivity, with a generalization of MULTISTEP also presented for graph biconnectivity. The M ULTISTEP approaches are shown to demonstrate a 2-7x mean speedup relative to the prior state-of-the-art. A graph partitioner called P ULP (Partitioning using Label Propagation) is also introduced along with a general distributed graph layout strategy, DGL. PULP was specifically designed to partition small-world graphs having skewed degree distributions, such as social interaction networks and web graphs. P ULP is able to partition such graphs an order of magnitude faster and with a fraction of the memory of other comparable partitioners (ParMETIS, PT-Scotch) while giving comparable partitions in terms of cut quality and balance. Additionally, this thesis presents how using techniques derived from these efforts, a suite of distributed graph analytics could be implemented and applied to the largest publicly-available web crawl of 3.5 billion pages and 130 billion links. End-to-end execution of analysis using these implementations completing in 20 minutes on only 256 nodes of the Blue Waters supercomputing system.;Throughout this thesis, analyses of the algorithms and subroutines that comprise the MULTISTEP, FASCIA/FASTP ATH, and PULP/DGL implementations is undertaken. Common optimizations are then identified (e.g., multiple levels of queues to match the memory hierarchy, techniques for non-blocking and asynchronous updates to shared data, efficient distributed communication patterns, among others) and their effects on performance are quantified. It is demonstrated how the optimization techniques can be utilized when processing under the higher degree of parallelism available in modern manycores (GPUs, Intel Xeon Phis) as well as how the techniques can be extended for more general-purpose graph processing in both the shared- and distributed-memory spaces. Also under consideration is the state of current hardware trends, with the goal of identifying how to modify and extend these general optimizations for forthcoming high performance computing architectures. Additionally, new optimizations and potential further research areas are introduced which might also be applicable for accelerating graph processing on these future systems.
机译:图形分析是对现实世界中交互数据的研究,无论是通过生物或化学交互网络,人类社会或通信网络,还是在整个社会和物理科学中普遍使用的其他可表示图形的数据集。由于数据量和复杂性的增加,开发用于研究此类数据的算法,工具和技术的有效且可扩展的方法非常重要。这些努力的另一个主要考虑因素是有效利用现代高性能计算系统日益增加的异构性和复杂性。本论文的主要贡献如下:首先,提出了几种基本图分析的并行和可扩展解决方案。介绍了用于子图同构的颜色编码算法的一种实现方式,即FFASCIA(快速近似子图计数和枚举)。使用多个避免工作,减少内存使用以及缓存/数据移动效率的优化,相对于现有技术,FASCIA展示了每核最多五个数量级的加速。 FASCIA能够在几分钟内在一个适度的16节点群集上计算数十亿个边图上多达10个顶点的子图计数,并将这些计数用于各种分析。使用FASCIA的基线方法,还引入了FASTPATH来查找加权网络中的最小权重路径。接下来介绍MULTISTEP方法,将其作为图连通性,弱连通性和强连通性的一种方法,并针对图双连通性介绍MULTISTEP的概括。相对于现有技术,M ULTISTEP方法显示出平均加速2-7倍。还引入了称为P ULP(使用标签传播进行分区)的图形分区程序,以及一般的分布式图形布局策略DGL。 PULP专门用于划分具有倾斜度分布的小世界图,例如社交交互网络和网络图。 P ULP能够以比其他可比分区(ParMETIS,PT-Scotch)少的内存的速度快一个数量级对此类图形进行分割,同时在切割质量和平衡方面提供可比的分区。此外,本文还介绍了如何使用从这些努力中获得的技术,可以实施一套分布式图形分析,并将其应用于35亿个页面和1300亿个链接的最大的公共可用Web爬网。使用这些实现方式进行的端到端分析仅在20分钟内在Blue Waters超级计算系统的256个节点上完成。本文中,对MULTISTEP,FASCIA / FASTP ATH和PULP /组成的算法和子例程进行了分析。进行DGL实施。然后确定常见的优化(例如,多个级别的队列以匹配存储器层次结构,用于对共享数据进行非阻塞和异步更新的技术,有效的分布式通信模式等)及其对性能的影响。演示了如何在现代多核(GPU,英特尔至强菲斯)上以更高的并行度进行处理时如何利用优化技术,以及如何在共享的数据库中将这些技术扩展为更通用的图形处理。和分布式内存空间。还需要考虑的是当前硬件趋势的状态,其目的是确定如何修改和扩展这些通用优化以用于即将推出的高性能计算体系结构。此外,还引入了新的优化方法和潜在的进一步研究领域,这些领域也可能适用于在这些未来系统上加速图形处理。

著录项

  • 作者

    Slota, George M.;

  • 作者单位

    The Pennsylvania State University.;

  • 授予单位 The Pennsylvania State University.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2016
  • 页码 272 p.
  • 总页数 272
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号