...
首页> 外文期刊>Computational particle mechanics >Comparison between pure MPI and hybrid MPI-OpenMP parallelism for Discrete Element Method (DEM) of ellipsoidal and poly-ellipsoidal particles
【24h】

Comparison between pure MPI and hybrid MPI-OpenMP parallelism for Discrete Element Method (DEM) of ellipsoidal and poly-ellipsoidal particles

机译:椭圆形和聚椭圆粒子离散元法(DEM)纯MPI和杂交MPI-OPENMP平行度的比较

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Parallel computing of 3D Discrete Element Method (DEM) simulations can be achieved in different modes, and two of them are pure MPI and hybrid MPI-OpenMP. The hybrid MPI-OpenMP mode allows flexibly combined mapping schemes on contemporary multiprocessing supercomputers. This paper profiles computational components and floating-point operation features of complex-shaped 3D DEM, develops a space decomposition-based MPI parallelism and various thread-based OpenMP parallelism, and carries out performance comparison and analysis from intranode to internode scales across four orders of magnitude of problem size (namely, number of particles). The influences of memory/cache hierarchy, processes/threads pinning, variation of hybrid MPI-OpenMP mapping scheme, ellipsoid versus poly-ellipsoid are carefully examined. It is found that OpenMP is able to achieve high efficiency in interparticle contact detection, but the unparallelizable code prevents it from achieving the same high efficiency for overall performance; pure MPI achieves not only lower computational granularity (thus higher spatial locality of particles) but also lower communication granularity (thus faster MPI transmission) than hybrid MPI-OpenMP using the same computational resources; the cache miss rate is sensitive to the memory consumption shrinkage per processor, and the last level cache contributes most significantly to the strong superlinear speedup among all of the three cache levels of modern microprocessors; in hybrid MPI-OpenMPI mode, as the number of MPI processes increases (and the number of threads per MPI processes decreases accordingly), the total execution time decreases, until the maximum performance is obtained at pure MPI mode; the processes/threads pinning on NUMA architectures improves performance significantly when there are multiple threads per process, whereas the improvement becomes less pronounced when the number of threads per process decreases; both the communication time and computation time increase substantially from ellipsoids to poly-ellipsoids. Overall, pure MPI outperforms hybrid MPI-OpenMP in 3D DEM modeling of ellipsoidal and poly-ellipsoidal particles.
机译:3D离散元素方法(DEM)模拟的并行计算可以以不同的模式实现,其中两个是纯MPI和混合MPI-OPENMP。混合MPI-OpenMP模式允许灵活组合在当代多处理超级计算机上的映射方案。本文简介复杂形状3D DEM的计算分量和浮点操作特征,开发了基于空间分解的MPI并行性和各种基于线程的OpenMP并行性,并通过四个顺序执行从Intranode到Intranode的性能比较和分析问题大小的幅度(即,粒子数)。记忆/高速缓存层次结构的影响,过程/线路固定,杂交MPI-OPENMP映射方案的变化,粒子素胶素体与聚椭ellipsoId的变化。结果发现,OpenMP能够在颗粒间接触检测中实现高效率,但不可联系的代码可防止其实现相同的整体性能高效率;纯MPI不仅实现了较低的计算粒度(因此粒子的空间局部地区),而且使用相同的计算资源的混合MPI-OpenMP还更低的通信粒度(因此更快的MPI传输);高速缓存未命中率对每个处理器的内存消耗收缩敏感,并且最后一级缓存对现代微处理器的三个高速缓存级别中的所有三个高速缓存级别中的强超线性加速贡献最大;在混合MPI-OpenMPI模式下,随着MPI处理的数量增加(并且每个MPI过程的线程数相应地减小),总执行时间减小,直到在纯MPI模式下获得最大性能;当每个过程有多个线程时,在NUMA架构上固定的过程/线程在NUMA架构上提高了性能,而当每个过程的线程数减小时,改善变得不那么明显;通信时间和计算时间都基本上从椭圆体增加到聚椭圆体。总体而言,纯MPI优于椭圆体和聚椭圆形颗粒的3D DEM模拟中的杂种MPI-Openmp。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号