首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >MPI-ACC: Accelerator-Aware MPI for Scientific Applications
【24h】

MPI-ACC: Accelerator-Aware MPI for Scientific Applications

机译:MPI-ACC:科学应用中的加速器感知MPI

获取原文
获取原文并翻译 | 示例
       

摘要

Data movement in high-performance computing systems accelerated by graphics processing units (GPUs) remains a challenging problem. Data communication in popular parallel programming models, such as the Message Passing Interface (MPI), is currently limited to the data stored in the CPU memory space. Auxiliary memory systems, such as GPU memory, are not integrated into such data movement standards, thus providing applications with no direct mechanism to perform end-to-end data movement. We introduce MPI-ACC, an integrated and extensible framework that allows end-to-end data movement in accelerator-based systems. MPI-ACC provides productivity and performance benefits by integrating support for auxiliary memory spaces into MPI. MPI-ACC supports data transfer among CUDA, OpenCL and CPU memory spaces and is extensible to other offload models as well. MPI-ACC's runtime system enables several key optimizations, including pipelining of data transfers, scalable memory management techniques, and balancing of communication based on accelerator and node architecture. MPI-ACC is designed to work concurrently with other GPU workloads with minimum contention. We describe how MPI-ACC can be used to design new communication-computation patterns in scientific applications from domains such as epidemiology simulation and seismology modeling, and we discuss the lessons learned. We present experimental results on a state-of-the-art cluster with hundreds of GPUs; and we compare the performance and productivity of MPI-ACC with MVAPICH, a popular CUDA-aware MPI solution. MPI-ACC encourages programmers to explore novel application-specific optimizations for improved overall cluster utilization.
机译:由图形处理单元(GPU)加速的高性能计算系统中的数据移动仍然是一个具有挑战性的问题。当前流行的并行编程模型中的数据通信(例如消息传递接口(MPI))仅限于存储在CPU内存空间中的数据。辅助存储系统(例如GPU存储器)未集成到此类数据移动标准中,因此为应用程序提供没有直接机制来执行端到端数据移动。我们介绍了MPI-ACC,这是一个集成的可扩展框架,允许在基于加速器的系统中进行端到端的数据移动。 MPI-ACC通过将对辅助内存空间的支持集成到MPI中,从而提高了生产率和性能。 MPI-ACC支持CUDA,OpenCL和CPU内存空间之间的数据传输,并且还可以扩展到其他卸载模型。 MPI-ACC的运行时系统可进行多项关键优化,包括数据传输的流水线,可扩展的内存管理技术以及基于加速器和节点体系结构的通信平衡。 MPI-ACC旨在与其他GPU工作负载并发工作,并且争用最少。我们描述了如何将MPI-ACC用于流行病学模拟和地震学建模等领域的科学应用中设计新的通信计算模式,并讨论了所学到的教训。我们在具有数百个GPU的先进集群上展示实验结果;然后,我们将MPI-ACC的性能和生产率与流行的CUDA感知MPI解决方案MVAPICH进行比较。 MPI-ACC鼓励程序员探索特定于应用程序的新颖优化,以提高总体群集利用率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号