Hardware Support for OpenMP Collective Operations

机译：OpenMP集体操作的硬件支持

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Efficient implementation of OpenMP collective operations (e.g. barriers and reductions) is essential for good performance from OpenMP programs. State-of-the-art on-chip networks and block-based cache coherence protocols used in shared memory Chip Multiprocessors (CMPs) are inefficient for implementing these collective operations. The performance of CMPs can be seriously degraded by the multitude of memory requests and coherence messages required to implement collective operations. To provide efficient support for OpenMP collective operations, this paper presents a CMP-AFN architecture and Instruction Set Architecture (ISA) extensions that augment a conventional shared-memory CMP with a tightly-integrated Aggregate Function Network (AFN) that implements low-latency collectives without using or interfering with the memory hierarchy. For a modest increase in circuit complexity, traffic within a CMP's internal network is dramatically reduced, improving the performance of caches and reducing power consumption. Full system simulations of 16-core CMPs show a CMP-AFN outperforms the reference design significantly, eliminating more than 60% of memory accesses and more than 70% of private L1 data cache misses in both the EPCC OpenMP microbenchmarks and SPEC OMP benchmarks.

机译：有效实施OpenMP集体操作（例如壁垒和减少措施）对于OpenMP计划的良好表现至关重要。共享存储器芯片多处理器（CMP）中使用的最新的片上网络和基于块的缓存一致性协议在实现这些集体操作方面效率低下。实施集体操作所需的大量内存请求和一致性消息会严重降低CMP的性能。为了为OpenMP集体操作提供有效的支持，本文提出了CMP-AFN体系结构和指令集体系结构（ISA）扩展，它们通过实现低延迟集体的紧密集成的聚合功能网络（AFN）增强了传统的共享内存CMP。而不使用或干扰内存层次结构。为了适度增加电路复杂度，可以大大减少CMP内部网络中的通信量，从而提高了缓存的性能并降低了功耗。对16核CMP的完整系统仿真显示，CMP-AFN明显优于参考设计，在EPCC OpenMP微基准测试和SPEC OMP基准测试中，消除了60％以上的内存访问和70％以上的私有L1数据高速缓存未命中。

著录项

来源
《Languages and compilers for parallel computing》|2009年|p.31-49|共19页
会议地点 Newark DE(US);Newark DE(US)
作者
Soohong P. Kim; Samuel P. Midkiff; Henry G. Dietz;
展开▼
作者单位

School of ECE, Purdue University;

School of ECE, Purdue University;

Department of ECE, University of Kentucky;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Minimizing the usage of hardware counters for collective communication using triggered operations [J] . Islam Nusrat Sharmin, Zheng Gengbin, Sur Sayantan, Parallel Computing . 2020,第Octa期

机译：使用触发操作最小化硬件计数器的使用
2. Parallelization of the Multilevel Fast Multipole Algorithm by Combined Use of OpenMP and VALU Hardware Acceleration [J] . Liu J., He M., Zhang K., Antennas and Propagation, IEEE Transactions on . 2014,第7期

机译：结合使用OpenMP和VALU硬件加速来并行化多级快速多极算法
3. BIO-MIMETIC CLASSIFICATION ON MODERN PARALLEL HARDWARE: REALIZATIONS ON NVIDIA CUDA™ AND OPENMP™ [J] . Thomas Nowotny, Mehmet K. Muezzinoglu, Ramon Huerta International Journal of Innovative Computing Information and Control . 2011,第7A期

机译：现代并行硬件上的生物模仿分类：在NVIDIA CUDA™和OPENMP™上的实现
4. Hardware Support for OpenMP Collective Operations [C] . Soohong P. Kim, Samuel P. Midkiff, Henry G. Dietz International Workshop on Languages and Compilers for Parallel Computing . 2010

机译：OpenMP集体操作的硬件支持
5. Cloud Versus Bare Metal a Comparison of a High Performance Computing Cluster Running in a Commercial Cloud and on a Traditional Hardware Cluster Using OpenMP and OpenMPI [D] . Bilaniuk, Vicky 2019

机译：云与裸机的比较-使用OpenMP和OpenMPI在商业云和传统硬件集群上运行的高性能计算集群的比较
6. Back-Propagation Operation for Analog Neural Network Hardware with Synapse Components Having Hysteresis Characteristics [O] . Michihito Ueda, Yu Nishitani, Yukihiro Kaneko, -1

机译：具有滞后特性的突触组件的模拟神经网络硬件的反向传播操作
7. Simulation Of Collective Operations Hardware Support For «Angara» Interconnect [O] . 2015

机译：仿真集体运营硬件支持«Angara»互连

Hardware Support for OpenMP Collective Operations

摘要

著录项

相似文献

相关主题

期刊订阅