首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Geometric Mapping of Tasks to Processors on Parallel Computers with Mesh or Torus Networks
【24h】

Geometric Mapping of Tasks to Processors on Parallel Computers with Mesh or Torus Networks

机译:具有网格或Torus网络的并行计算机上的任务到处理器上的几何映射

获取原文
获取原文并翻译 | 示例
           

摘要

We present a new method for reducing parallel applications' communication time by mapping their MPI tasks to processors in a way that lowers the distance messages travel and the amount of congestion in the network. Assuming geometric proximity among the tasks is a good approximation of their communication interdependence, we use a geometric partitioning algorithm to order both the tasks and the processors, assigning task parts to the corresponding processor parts. In this way, interdependent tasks are assigned to "nearby" cores in the network. We also present a number of algorithmic optimizations that exploit specific features of the network or application to further improve the quality of the mapping. We specifically address the case of sparse node allocation, where the nodes assigned to a job are not necessarily located in a contiguous block nor within close proximity to each other in the network. However, our methods generalize to contiguous allocations as well, and results are shown for both contiguous and non-contiguous allocations. We show that, for the structured finite difference mini-application MiniGhost, our mapping methods reduced communication time up to 75 percent relative to MiniGhost's default mapping on 128K cores of a Cray XK7 with sparse allocation. For the atmospheric modeling code E3SM/HOMME, our methods reduced communication time up to 31% on 16K cores of an IBM BlueGene/Q with contiguous allocation.
机译:我们提出了一种新方法,该方法通过将并行应用程序的MPI任务映射到处理器来减少并行应用程序的通信时间,从而降低了消息传播的距离和网络中的拥塞量。假设任务之间的几何接近度很好地近似了它们之间的通信相互依赖性,我们使用几何分区算法对任务和处理器进行排序,将任务部分分配给相应的处理器部分。通过这种方式,相互依赖的任务被分配给网络中的“附近”核心。我们还提出了许多算法优化,它们利用网络或应用程序的特定功能来进一步提高映射质量。我们专门解决稀疏节点分配的情况,在这种情况下,分配给作业的节点不必位于连续的块中,也不必位于网络中彼此紧邻的位置。但是,我们的方法也适用于连续分配,并且显示了连续分配和非连续分配的结果。我们表明,对于结构化有限差分微型应用程序MiniGhost,相对于MiniGhost在具有稀疏分配的Cray XK7的128K内核上的默认映射,相对于MiniGhost的默认映射,映射方法将通信时间减少了多达75%。对于大气建模代码E3SM / HOMME,我们的方法将具有连续分配的IBM BlueGene / Q的16K内核上的通信时间减少了多达31%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号