首页> 外文学位 >Irregular graph algorithms on modern multicore, manycore, and distributed processing systems.

【24h】

Irregular graph algorithms on modern multicore, manycore, and distributed processing systems.

机译：现代多核，众核和分布式处理系统上的不规则图算法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graph analysis is the study of real-world interaction data, be it through biological or chemical interaction networks, human social or communication networks, or other graph-representable datasets pervasive throughout the social and physical sciences. Due to increasing data sizes and complexities, it is important to develop efficient and scalable approaches for the algorithms, tools, and techniques used to study such data. Efficient utilization of the increasing heterogeneity and complexity of modern high performance computing systems is another major consideration for these efforts.;The primary contributions of this thesis are as follows: First, parallel and scalable solutions to several basic graph analytics are presented. An implementation of the color-coding algorithm for subgraph isomorphism is introduced as FFASCIA (Fast Approximate Subgraph Counting and Enumeration). Using several optimizations for work avoidance, memory usage reduction, and cache/data movement efficiency, FASCIA demonstrates up to a five orders-of-magnitude per-core speedup relative to prior art. FASCIA is able to calculate the counts of subgraphs up to 10 vertices on multi-billion edge graphs in minutes on a modest 16 node cluster and use these counts for a variety of analytics. Using FASCIA 's baseline approach, FASTPATH is also introduced to find minimum weight paths in weighted networks. The MULTISTEP method is next introduced as an approach for graph connectivity, weak connectivity, and strong connectivity, with a generalization of MULTISTEP also presented for graph biconnectivity. The M ULTISTEP approaches are shown to demonstrate a 2-7x mean speedup relative to the prior state-of-the-art. A graph partitioner called P ULP (Partitioning using Label Propagation) is also introduced along with a general distributed graph layout strategy, DGL. PULP was specifically designed to partition small-world graphs having skewed degree distributions, such as social interaction networks and web graphs. P ULP is able to partition such graphs an order of magnitude faster and with a fraction of the memory of other comparable partitioners (ParMETIS, PT-Scotch) while giving comparable partitions in terms of cut quality and balance. Additionally, this thesis presents how using techniques derived from these efforts, a suite of distributed graph analytics could be implemented and applied to the largest publicly-available web crawl of 3.5 billion pages and 130 billion links. End-to-end execution of analysis using these implementations completing in 20 minutes on only 256 nodes of the Blue Waters supercomputing system.;Throughout this thesis, analyses of the algorithms and subroutines that comprise the MULTISTEP, FASCIA/FASTP ATH, and PULP/DGL implementations is undertaken. Common optimizations are then identified (e.g., multiple levels of queues to match the memory hierarchy, techniques for non-blocking and asynchronous updates to shared data, efficient distributed communication patterns, among others) and their effects on performance are quantified. It is demonstrated how the optimization techniques can be utilized when processing under the higher degree of parallelism available in modern manycores (GPUs, Intel Xeon Phis) as well as how the techniques can be extended for more general-purpose graph processing in both the shared- and distributed-memory spaces. Also under consideration is the state of current hardware trends, with the goal of identifying how to modify and extend these general optimizations for forthcoming high performance computing architectures. Additionally, new optimizations and potential further research areas are introduced which might also be applicable for accelerating graph processing on these future systems.

机译：图形分析是对现实世界中交互数据的研究，无论是通过生物或化学交互网络，人类社会或通信网络，还是在整个社会和物理科学中普遍使用的其他可表示图形的数据集。由于数据量和复杂性的增加，开发用于研究此类数据的算法，工具和技术的有效且可扩展的方法非常重要。这些努力的另一个主要考虑因素是有效利用现代高性能计算系统日益增加的异构性和复杂性。本论文的主要贡献如下：首先，提出了几种基本图分析的并行和可扩展解决方案。介绍了用于子图同构的颜色编码算法的一种实现方式，即FFASCIA（快速近似子图计数和枚举）。使用多个避免工作，减少内存使用以及缓存/数据移动效率的优化，相对于现有技术，FASCIA展示了每核最多五个数量级的加速。 FASCIA能够在几分钟内在一个适度的16节点群集上计算数十亿个边图上多达10个顶点的子图计数，并将这些计数用于各种分析。使用FASCIA的基线方法，还引入了FASTPATH来查找加权网络中的最小权重路径。接下来介绍MULTISTEP方法，将其作为图连通性，弱连通性和强连通性的一种方法，并针对图双连通性介绍MULTISTEP的概括。相对于现有技术，M ULTISTEP方法显示出平均加速2-7倍。还引入了称为P ULP（使用标签传播进行分区）的图形分区程序，以及一般的分布式图形布局策略DGL。 PULP专门用于划分具有倾斜度分布的小世界图，例如社交交互网络和网络图。 P ULP能够以比其他可比分区（ParMETIS，PT-Scotch）少的内存的速度快一个数量级对此类图形进行分割，同时在切割质量和平衡方面提供可比的分区。此外，本文还介绍了如何使用从这些努力中获得的技术，可以实施一套分布式图形分析，并将其应用于35亿个页面和1300亿个链接的最大的公共可用Web爬网。使用这些实现方式进行的端到端分析仅在20分钟内在Blue Waters超级计算系统的256个节点上完成。本文中，对MULTISTEP，FASCIA / FASTP ATH和PULP /组成的算法和子例程进行了分析。进行DGL实施。然后确定常见的优化（例如，多个级别的队列以匹配存储器层次结构，用于对共享数据进行非阻塞和异步更新的技术，有效的分布式通信模式等）及其对性能的影响。演示了如何在现代多核（GPU，英特尔至强菲斯）上以更高的并行度进行处理时如何利用优化技术，以及如何在共享的数据库中将这些技术扩展为更通用的图形处理。和分布式内存空间。还需要考虑的是当前硬件趋势的状态，其目的是确定如何修改和扩展这些通用优化以用于即将推出的高性能计算体系结构。此外，还引入了新的优化方法和潜在的进一步研究领域，这些领域也可能适用于在这些未来系统上加速图形处理。

著录项

作者
Slota, George M.;
展开▼
作者单位

The Pennsylvania State University.;

展开▼
授予单位 The Pennsylvania State University.;
学科 Computer science.
学位 Ph.D.
年度 2016
页码 272 p.
总页数 272
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Optimizing Graph Algorithms in Asymmetric Multicore Processors [J] . Jyothi Krishna V S, Rupesh Nasre IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2018,第11期

机译：非对称多核处理器中的图形算法优化
2. Towards An Optimal Multicore Processor Design for Cryptographic Algorithms - A Case Study on RSA [J] . MUTAZ Al-TARAWNEH, ASHRAF ALKHRESHEH WSEAS Transactions on Computers . 2014,第Pta1期

机译：面向密码算法的最佳多核处理器设计-以RSA为例
3. Three-dimensional thinning algorithms on graphics processing units and multicore CPUs [J] . J. Jimenez, J. Ruiz de Miras Concurrency and computation: practice and experience . 2012,第14期

机译：图形处理单元和多核CPU上的三维稀疏算法
4. A time-efficient image processing algorithm for multicore/manycore parallel computing [C] . Asaduzzaman Abu, Martinez Angel, Sepehri Aras IEEE Southeastcon . 2015

机译：一种用于多核/多核并行计算的省时图像处理算法
5. Parallel Algorithm Design and Implementation of Regular/Irregular Problems: An In-depth Performance Study on Graphics Processing Units. [D] . Solomon, Steven. 2012

机译：规则/不规则问题的并行算法设计和实现：图形处理单元的深入性能研究。
6. Graphics Processing Unit (GPU) implementation of image processing algorithms to improve system performance of the Control Acquisition Processing and Image Display System (CAPIDS) of the Micro-Angiographic Fluoroscope (MAF) [O] . S.N. Swetadri Vasan, Ciprian N. Ionita, A.H. Titus, -1

机译：图形处理单元（GpU）执行的图像处理算法以改善控制采集处理的系统的性能以及微造影荧光镜的图像显示系统（CapIDs）（maF）
7. A flexible simulation framework for processor scheduling algorithms in multicore systems. [O] . 2012

机译：用于多核系统中处理器调度算法的灵活仿真框架。

Irregular graph algorithms on modern multicore, manycore, and distributed processing systems.

摘要

著录项

相似文献

相关主题

期刊订阅