MPI-ACC: Accelerator-Aware MPI for Scientific Applications

A. M. Aji; L. S. Panwar; F. Ji; K. Murthy; M. Chabbi; P. Balaji; K. R. Bisset; J. Dinan; W. c. Feng; J. Mellor-Crummey; X. Ma; R. Thakur

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >MPI-ACC: Accelerator-Aware MPI for Scientific Applications

【24h】

MPI-ACC: Accelerator-Aware MPI for Scientific Applications

机译：MPI-ACC：科学应用中的加速器感知MPI

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data movement in high-performance computing systems accelerated by graphics processing units (GPUs) remains a challenging problem. Data communication in popular parallel programming models, such as the Message Passing Interface (MPI), is currently limited to the data stored in the CPU memory space. Auxiliary memory systems, such as GPU memory, are not integrated into such data movement standards, thus providing applications with no direct mechanism to perform end-to-end data movement. We introduce MPI-ACC, an integrated and extensible framework that allows end-to-end data movement in accelerator-based systems. MPI-ACC provides productivity and performance benefits by integrating support for auxiliary memory spaces into MPI. MPI-ACC supports data transfer among CUDA, OpenCL and CPU memory spaces and is extensible to other offload models as well. MPI-ACC's runtime system enables several key optimizations, including pipelining of data transfers, scalable memory management techniques, and balancing of communication based on accelerator and node architecture. MPI-ACC is designed to work concurrently with other GPU workloads with minimum contention. We describe how MPI-ACC can be used to design new communication-computation patterns in scientific applications from domains such as epidemiology simulation and seismology modeling, and we discuss the lessons learned. We present experimental results on a state-of-the-art cluster with hundreds of GPUs; and we compare the performance and productivity of MPI-ACC with MVAPICH, a popular CUDA-aware MPI solution. MPI-ACC encourages programmers to explore novel application-specific optimizations for improved overall cluster utilization.

机译：由图形处理单元（GPU）加速的高性能计算系统中的数据移动仍然是一个具有挑战性的问题。当前流行的并行编程模型中的数据通信（例如消息传递接口（MPI））仅限于存储在CPU内存空间中的数据。辅助存储系统（例如GPU存储器）未集成到此类数据移动标准中，因此为应用程序提供没有直接机制来执行端到端数据移动。我们介绍了MPI-ACC，这是一个集成的可扩展框架，允许在基于加速器的系统中进行端到端的数据移动。 MPI-ACC通过将对辅助内存空间的支持集成到MPI中，从而提高了生产率和性能。 MPI-ACC支持CUDA，OpenCL和CPU内存空间之间的数据传输，并且还可以扩展到其他卸载模型。 MPI-ACC的运行时系统可进行多项关键优化，包括数据传输的流水线，可扩展的内存管理技术以及基于加速器和节点体系结构的通信平衡。 MPI-ACC旨在与其他GPU工作负载并发工作，并且争用最少。我们描述了如何将MPI-ACC用于流行病学模拟和地震学建模等领域的科学应用中设计新的通信计算模式，并讨论了所学到的教训。我们在具有数百个GPU的先进集群上展示实验结果；然后，我们将MPI-ACC的性能和生产率与流行的CUDA感知MPI解决方案MVAPICH进行比较。 MPI-ACC鼓励程序员探索特定于应用程序的新颖优化，以提高总体群集利用率。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2016年第5期|1401-1414|共14页
作者
A. M. Aji; L. S. Panwar; F. Ji; K. Murthy; M. Chabbi; P. Balaji; K. R. Bisset; J. Dinan; W. c. Feng; J. Mellor-Crummey; X. Ma; R. Thakur;
展开▼
作者单位

Department of Computer Science, Blacksburg, VA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Concurrent programming; Distributed architectures; Heterogeneous (hybrid) systems; Parallel systems; concurrent programming; distributed architectures; parallel systems;

机译：并发编程;分布式体系结构;异构（混合）系统;并行系统;并发编程;分布式体系结构;并行系统;

相似文献

外文文献
中文文献
专利

1. Interoperability strategies for GASPI and MPI in large-scale scientific applications [J] . Simmendinger Christian, Iakymchuk Roman, Cebamanos Luis, Experimental Mechanics . 2019,第3期

机译：GASPI和MPI在大规模科学应用中的互操作性策略
2. Interoperability strategies for GASPI and MPI in large-scale scientific applications [J] . Simmendinger Christian, Iakymchuk Roman, Cebamanos Luis, Experimental Mechanics . 2019,第3期

机译：大规模科学应用中Gaspi和MPI的互操作性策略
3. On the efficacy of GPU-integrated MPI for scientific applications [J] . Chris Lupo Computing reviews . 2014,第1期

机译：GPU集成MPI在科学应用中的功效
4. MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems [C] . Aji Ashwin M., Dinan James, Buntinas Darius, The 14th IEEE International Conference on High Performance Computing and Communication ; The 9th IEEE International Conference on Embedded Software and Systems. . 2012

机译：MPI-ACC：基于加速器的系统中数据移动的集成和可扩展方法
5. Coordinated checkpoint/restart process fault tolerance for MPI applications on HPC systems. [D] . Hursey, Joshua. 2010

机译：HPC系统上MPI应用程序的协调检查点/重启过程容错能力。
6. Scientific production and technological production: transforming a scientific paper into patent applications [O] . Cleber Gustavo Dias, Roberto Barbosa de Almeida 2013

机译：科学生产和技术生产：将科学论文转化为专利申请
7. MPI-ACC: Accelerator-Aware MPI for Scientific Applications [O] . Ashwin M. Aji, Lokendra S. Panwar, Feng Ji, 2016

机译：MPI-ACC：加速器感知科学应用的MPI

MPI-ACC: Accelerator-Aware MPI for Scientific Applications

摘要

著录项

相似文献

相关主题

期刊订阅