首页> 外文会议>Languages and compilers for parallel computing >Hardware Support for OpenMP Collective Operations
【24h】

Hardware Support for OpenMP Collective Operations

机译:OpenMP集体操作的硬件支持

获取原文
获取原文并翻译 | 示例

摘要

Efficient implementation of OpenMP collective operations (e.g. barriers and reductions) is essential for good performance from OpenMP programs. State-of-the-art on-chip networks and block-based cache coherence protocols used in shared memory Chip Multiprocessors (CMPs) are inefficient for implementing these collective operations. The performance of CMPs can be seriously degraded by the multitude of memory requests and coherence messages required to implement collective operations. To provide efficient support for OpenMP collective operations, this paper presents a CMP-AFN architecture and Instruction Set Architecture (ISA) extensions that augment a conventional shared-memory CMP with a tightly-integrated Aggregate Function Network (AFN) that implements low-latency collectives without using or interfering with the memory hierarchy. For a modest increase in circuit complexity, traffic within a CMP's internal network is dramatically reduced, improving the performance of caches and reducing power consumption. Full system simulations of 16-core CMPs show a CMP-AFN outperforms the reference design significantly, eliminating more than 60% of memory accesses and more than 70% of private L1 data cache misses in both the EPCC OpenMP microbenchmarks and SPEC OMP benchmarks.
机译:有效实施OpenMP集体操作(例如壁垒和减少措施)对于OpenMP计划的良好表现至关重要。共享存储器芯片多处理器(CMP)中使用的最新的片上网络和基于块的缓存一致性协议在实现这些集体操作方面效率低下。实施集体操作所需的大量内存请求和一致性消息会严重降低CMP的性能。为了为OpenMP集体操作提供有效的支持,本文提出了CMP-AFN体系结构和指令集体系结构(ISA)扩展,它们通过实现低延迟集体的紧密集成的聚合功能网络(AFN)增强了传统的共享内存CMP。而不使用或干扰内存层次结构。为了适度增加电路复杂度,可以大大减少CMP内部网络中的通信量,从而提高了缓存的性能并降低了功耗。对16核CMP的完整系统仿真显示,CMP-AFN明显优于参考设计,在EPCC OpenMP微基准测试和SPEC OMP基准测试中,消除了60%以上的内存访问和70%以上的私有L1数据高速缓存未命中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号