Simulation of the performance and scalability of message passing interface (MPI) communications of atmospheric models running on exascale supercomputers

Zheng Yongjun; Marguinaud Philippe

摘要

In this study, we identify the key message passing interface (MPI) operations required in atmospheric modelling; then, we use a skeleton program and a simulation framework (based on SST/macro simulation package) to simulate these MPI operations (transposition, halo exchange, and allreduce), with the perspective of future exascale machines in mind. The experimental results show that the choice of the collective algorithm has a great impact on the performance of communications; in particular, we find that the generalized ring-k algorithm for the alltoallv operation and the generalized recursive-k algorithm for the allreduce operation perform the best. In addition, we observe that the impacts of interconnect topologies and routing algorithms on the performance and scalability of transpositions, halo exchange, and allreduce operations are significant. However, the routing algorithm has a negligible impact on the performance of allreduce operations because of its small message size. It is impossible to infinitely grow bandwidth and reduce latency due to hardware limitations. Thus, congestion may occur and limit the continuous improvement of the performance of communications. The experiments show that the performance of communications can be improved when congestion is mitigated by a proper configuration of the topology and routing algorithm, which uniformly distribute the congestion over the interconnect network to avoid the hotspots and bottlenecks caused by congestion. It is generally believed that the transpositions seriously limit the scalability of the spectral models. The experiments show that the communication time of the transposition is larger than those of the wide halo exchange for the semi-Lagrangian method and the allreduce in the generalized conjugate residual (GCR) iterative solver for the semi-implicit method below 2×105 MPI processes. The transposition whose communication time decreases quickly with increasing number of MPI processes demonstrates strong scalability in the case of very large grids and moderate latencies. The halo exchange whose communication time decreases more slowly than that of transposition with increasing number of MPI processes reveals its weak scalability. In contrast, the allreduce whose communication time increases with increasing number of MPI processes does not scale well. From this point of view, the scalability of spectral models could still be acceptable. Therefore it seems to be premature to conclude that the scalability of the grid-point models is better than that of spectral models at the exascale, unless innovative methods are exploited to mitigate the problem of the scalability presented in the grid-point models.

机译：在本研究中，我们确定了大气建模中所需的关键消息通过接口（MPI）操作;然后，我们使用骨架程序和仿真框架（基于SST /宏仿真包）来模拟这些MPI操作（换位，光环交换和已释放），并考虑到未来的ExaScale机器的角度。实验结果表明，集体算法的选择对通信的性能产生了很大的影响;特别是，我们发现用于AllToAllV操作的广义Ring-k算法和用于已解施操作的广义递归-K算法。此外，我们观察到互连拓扑和路由算法对换位，光环交换和已解释操作的性能和可扩展性的影响很大。然而，由于其消息尺寸小，路由算法对已解额外运算的性能的影响可忽略不计。由于硬件限制，不确定地增长带宽并降低延迟。因此，可能发生拥塞并限制通信性能的持续改进。实验表明，当通过适当配置拓扑和路由算法来减轻拥塞时，可以提高通信的性能，这均匀地分布互连网络上的拥塞以避免由拥塞引起的热点和瓶颈。通常相信换位严重限制了光谱模型的可扩展性。实验表明，转子的通信时间大于半拉格朗日方法的宽光环交换的通信时间，以及在2×105MPi工艺低于2×105MPi过程的半隐式方法中的普遍共轭残余（GCR）迭代求解器中的已释放。在越来越多的MPI过程中，通信时间随着越来越多的MPI工艺而减少的转换在非常大的网格和中等延迟的情况下表现出强大的可扩展性。通过越来越多的MPI过程越来越多的转子，HALO交换越来越慢地显示其弱可扩展性。相反，随着越来越多的MPI进程的通信时间增加的重新寿险不符号。从这个角度来看，谱模型的可扩展性仍然可以是可接受的。因此，得出结论，否则网格点模型的可扩展性优于Exascale的谱模型的可扩展性，除非利用创新方法来减轻网格点模型中呈现的可扩展性问题。

Simulation of the performance and scalability of message passing interface (MPI) communications of atmospheric models running on exascale supercomputers

摘要

著录项

相关主题

期刊订阅