TOD-Tree: Task-Overlapped Direct Send Tree Image Compositing for Hybrid MPI Parallelism and GPUs

A. V. Pascal Grosset; Manasa Prasad; Cameron Christensen; Aaron Knoll; Charles Hansen

首页> 外文期刊>IEEE transactions on visualization and computer graphics >TOD-Tree: Task-Overlapped Direct Send Tree Image Compositing for Hybrid MPI Parallelism and GPUs

【24h】

TOD-Tree: Task-Overlapped Direct Send Tree Image Compositing for Hybrid MPI Parallelism and GPUs

机译：TOD-Tree：用于混合MPI并行和GPU的任务重叠的直接发送树图像合成

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern supercomputers have thousands of nodes, each with CPUs and/or GPUs capable of several teraflops. However, the network connecting these nodes is relatively slow, on the order of gigabits per second. For time-critical workloads such as interactive visualization, the bottleneck is no longer computation but communication. In this paper, we present an image compositing algorithm that works on both CPU-only and GPU-accelerated supercomputers and focuses on communication avoidance and overlapping communication with computation at the expense of evenly balancing the workload. The algorithm has three stages: a parallel direct send stage, followed by a tree compositing stage and a gather stage. We compare our algorithm with radix-k and binary-swap from the IceT library in a hybrid OpenMP/MPI setting on the Stampede and Edison supercomputers, show strong scaling results and explain how we generally achieve better performance than these two algorithms. We developed a GPU-based image compositing algorithm where we use CUDA kernels for computation and GPU Direct RDMA for inter-node GPU communication. We tested the algorithm on the Piz Daint GPU-accelerated supercomputer and show that we achieve performance on par with CPUs. Last, we introduce a workflow in which both rendering and compositing are done on the GPU.

机译：现代的超级计算机具有数千个节点，每个节点都具有能够支持数兆位触发器的CPU和/或GPU。但是，连接这些节点的网络相对较慢，约为每秒千兆字节。对于时间紧迫的工作负载（例如交互式可视化），瓶颈不再是计算，而是通信。在本文中，我们提出了一种图像合成算法，该算法可在仅CPU和GPU加速的超级计算机上工作，并着重于避免通信和将通信与计算重叠，以均衡地平衡工作量为代价。该算法分为三个阶段：并行直接发送阶段，随后的树组合阶段和聚集阶段。我们在Stampede和Edison超级计算机上的OpenMP / MPI混合设置中，将我们的算法与IceT库中的radik-k和binary-swap进行了比较，显示了强大的缩放结果，并解释了我们通常如何获得比这两种算法更好的性能。我们开发了基于GPU的图像合成算法，其中我们使用CUDA内核进行计算，并使用GPU Direct RDMA进行节点间GPU通信。我们在Piz Daint GPU加速的超级计算机上测试了该算法，并证明我们可以达到与CPU相当的性能。最后，我们介绍一个工作流，其中渲染和合成都在GPU上完成。

著录项

来源
《IEEE transactions on visualization and computer graphics》 |2017年第6期|1677-1690|共14页
作者
A. V. Pascal Grosset; Manasa Prasad; Cameron Christensen; Aaron Knoll; Charles Hansen;
展开▼
作者单位

Scientific Computing and Imaging Institute at the University of Utah, Salt Lake City, UT;

Google, Mountain View, California, CA;

Scientific Computing and Imaging Institute at the University of Utah, Salt Lake City, UT;

Scientific Computing and Imaging Institute at the University of Utah, Salt Lake City, UT;

Scientific Computing and Imaging Institute at the University of Utah, Salt Lake City, UT;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Graphics processing units; Rendering (computer graphics); Supercomputers; Parallel processing; Data visualization; Loading; Message systems;

机译：图形处理单元;渲染（计算机图形学）;超级计算机;并行处理;数据可视化;加载;消息系统;

相似文献

外文文献
中文文献
专利

1. Efficient magnetohydrodynamic simulations on distributed multi-GPU systems using a novel GPU Direct-MPI hybrid approach [J] . Un-Hong Wong, Takayuki Aoki, Hon-Cheng Wong Computer physics communications . 2014,第7期

机译：使用新颖的GPU Direct-MPI混合方法在分布式多GPU系统上进行有效的磁流体动力学模拟
2. Comparison between pure MPI and hybrid MPI-OpenMP parallelism for Discrete Element Method (DEM) of ellipsoidal and poly-ellipsoidal particles [J] . Yan Beichuan, Regueiro Richard A. Computational particle mechanics . 2019,第2期

机译：椭圆形和聚椭圆粒子离散元法（DEM）纯MPI和杂交MPI-OPENMP平行度的比较
3. Directionally unsplit hydrodynamic schemes with hybrid MPI/OpenMP/ GPU parallelization in AMR [J] . Hsi-Yu Schive, Ui-Han Zhang, Tzihong Chiueh International Journal of High Performance Computing Applications . 2012,第4期

机译：AMR中具有混合MPI / OpenMP / GPU并行化的定向非分裂水动力方案
4. MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling [C] . Akshay Venkatesh, Khaled Hamidouche, Sreeram Potluri, International Conference on Parallel Processing . 2017

机译：MPI-GDS：高性能MPI设计与GPudirect-Async用于CPU-GPU控制流量去耦
5. An MPI-CUDA implementation of a model for calcium induced calcium release in a three-dimensional heart cell on a hybrid CPU/GPU cluster [D] . Huang, Xuan 2015

机译：MPI-CUDA模型在混合CPU / GPU集群上的三维心脏细胞中钙诱导的钙释放的模型实现
6. Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems [O] . Kaibo Wang, Yin Huai, Rubao Lee, -1

机译：在CpU-GpU混合系统加速病理图像数据交叉对比
7. Directionally unsplit hydrodynamic schemes with hybrid MPI/OpenMP/GPU parallelization in AMR [O] . Ui-han Zhang, Tzihong Chiueh 2012

机译：在amR中采用混合mpI / Openmp / GpU并行化的定向非分裂流体动力学方案

TOD-Tree: Task-Overlapped Direct Send Tree Image Compositing for Hybrid MPI Parallelism and GPUs

摘要

著录项

相似文献

相关主题

期刊订阅