首页> 外文期刊>Operating systems review >An Integrated Compile-Time/Run-Time Software Distributed Shared Memory System
【24h】

An Integrated Compile-Time/Run-Time Software Distributed Shared Memory System

机译:集成的编译时/运行时软件分布式共享内存系统

获取原文
获取原文并翻译 | 示例
           

摘要

On a distributed memory machine, hand-coded message passing leads to the most efficient execution, but it is difficult to use. Parallelizing compilers can approach the performance of hand-coded message passing by translating data-parallel programs into message passing programs, but efficient execution is limited to those programs for which precise analysis can be carried out. Shared memory is easier to program than message passing and its domain is not constrained by the limitations of parallelizing compilers, but it lags in performance. Our goal is to close that performance gap while retaining the benefits of shared memory. In other words, our goal is (1) to make shared memory as efficient as message passing, whether hand-coded or compiler-generated, (2) to retain its ease of programming, and (3) to retain the broader class of applications it supports. To this end we have designed and implemented an integrated compile-time and run-time software DSM system. The programming model remains identical to the original pure run-time DSM system. No user intervention is required to obtain the benefits of our system. The compiler computes data access patterns for the individual processors. It then performs a source-to-source transformation, inserting in the program calls to inform the run-time system of the computed data access patterns. The run-time system uses this information to aggregate communication, to aggregate data and synchronization into a single message, to eliminate consistency overhead, and to replace global synchronization with point-to-point synchronization wherever possible. We extended the Parascope programming environment to perform the required analysis, and we augmented the TreadMarks run-time DSM library to take advantage of the analysis. We used six Fortran programs to assess the performance benefits: Jacobi, 3D-FFT, Integer Sort, Shallow, Gauss, and Modified Gramm-Schmidt, each with two different data set sizes. The experiments were run on an 8-node IBM SP/2 using user-space communication. Compiler optimization in conjunction with the augmented run-time system achieves substantial execution time improvements in comparison to the base TreadMarks, ranging from 4% to 59% on 8 processors. Relative to message passing imple- mentations of the same applications, the compile-time run-time system is 0-29% slower than message passing, while the base run-time system is 5-212% slower. For the five programs that XHPF could parallelize (all except IS), the execution times achieved by the compiler optimized shared memory programs are within 9% of XHPF.
机译:在分布式存储计算机上,手工编码的消息传递可导致最有效的执行,但是很难使用。并行编译器可以通过将并行数据程序转换为消息传递程序来实现手工编码消息传递的性能,但是有效执行仅限于可以对其进行精确分析的程序。共享内存比消息传递更容易编程,并且共享内存的域不受并行化编译器的限制,但性能落后。我们的目标是缩小性能差距,同时保留共享内存的优势。换句话说,我们的目标是(1)使共享内存与消息传递一样有效,无论是手工编码还是由编译器生成;(2)保持其易于编程,以及(3)保留更广泛的应用程序类别它支持。为此,我们设计并实现了一个集成的编译时和运行时软件DSM系统。编程模型与原始纯运行时DSM系统相同。无需用户干预即可获得我们系统的好处。编译器为各个处理器计算数据访问模式。然后,它执行源到源的转换,在程序调用中插入以将计算出的数据访问模式通知运行时系统。运行时系统使用此信息来聚合通信,将数据和同步聚合到单个消息中,消除一致性开销,并在可能的情况下用点对点同步替换全局同步。我们扩展了Parascope编程环境以执行所需的分析,并扩展了TreadMarks运行时DSM库以利用分析的优势。我们使用六个Fortran程序来评估性能优势:Jacobi,3D-FFT,整数排序,浅,高斯和修改的Gramm-Schmidt,每个程序都有两个不同的数据集大小。实验是使用用户空间通信在8节点IBM SP / 2上运行的。与基本的TreadMark相比,编译器优化与增强的运行时系统相结合,实现了显着的执行时间改进,在8个处理器上从4%到59%不等。相对于相同应用程序的消息传递实现,编译时运行时系统比消息传递慢0-29%,而基本运行时系统慢5-212%。对于XHPF可以并行化的五个程序(除IS以外的所有程序),由编译器优化的共享内存程序实现的执行时间在XHPF的9%之内。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号