...
首页> 外文期刊>Simulation modelling practice and theory: International journal of the Federation of European Simulation Societies >Performance improvement of parallel programs on a broadcast-based distributed shared memory multiprocessor by simulation
【24h】

Performance improvement of parallel programs on a broadcast-based distributed shared memory multiprocessor by simulation

机译:通过仿真提高基于广播的分布式共享内存多处理器上并行程序的性能

获取原文
获取原文并翻译 | 示例
           

摘要

Due to advances in fiber optics and VLSI technology, interconnection networks that allow simultaneous broadcasts are becoming feasible. Distributed shared memory (DSM) implementations on such networks promise high performance even for small applications with small granularity. This paper, after summarizing the architecture of one such implementation called the Simultaneous Multiprocessor Optical Exchange Bus (SOME-Bus), presents simple algorithms for improving the performance of parallel programs running on the SOME-Bus multiprocessor implementing cache-coherent DSM. The algorithms are based on run-time data redistribution via dynamic page migration protocol. They use memory access references together with the information of average channel utilization, average channel waiting time, number of messages in the channel queue or short-term average channel waiting time reported by each node and gathered by hardware monitors to make correct decisions related to the placement of shared data. Simulations with four parallel codes on a 64-processor SOME-Bus show that the algorithms yield significant performance improvements such as reduction in the execution times, number of remote memory accesses, average channel waiting times, average network latencies and increase in average channel utilizations. (C) 2007 Elsevier B.V. All rights reserved.
机译:由于光纤和VLSI技术的进步,允许同时广播的互连网络变得可行。这样的网络上的分布式共享内存(DSM)实现,即使对于具有小粒度的小型应用程序,也保证了高性能。在总结了一种称为“同时多处理器光交换总线”(SOME-Bus)的实现的体系结构后,本文提出了一些简单的算法,以提高在实现缓存一致性DSM的SOME-Bus多处理器上运行的并行程序的性能。该算法基于通过动态页面迁移协议进行的运行时数据重新分配。他们将内存访问参考与平均信道利用率,平均信道等待时间,信道队列中的消息数或每个节点报告并由硬件监控器收集的短期平均信道等待时间的信息一起使用,以做出与故障诊断有关的正确决策。共享数据的放置。在64个处理器的SOME-Bus上使用四个并行代码进行的仿真表明,该算法可显着提高性能,例如减少执行时间,减少远程存储器访问次数,平均通道等待时间,平均网络等待时间以及平均通道利用率的提高。 (C)2007 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号