【24h】

Compiler Directed Parallelization of Loops in Scale for Shared-Memory Multiprocessors

机译:共享内存多处理器的编译器定向大规模循环并行化

获取原文
获取原文并翻译 | 示例

摘要

Effective utilization of symmetric shared-memory multiprocessors (SMPs) is predicated on the development of efficient parallel code. Unfortunately, efficient parallelism is not always easy for the programmer to identify. Worse, exploiting such parallelism may directly conflict with optimizations affecting per-processor utilization (i.e. loop reordering to improve data locality). Here, we present our experience with a loop-level parallel compiler optimization for SMPs proposed by McKinley. The algorithm uses dependence analysis and a simple model of the target machine, to transform nested loops. The goal of the approach is to promote efficient execution of parallel loops by exposing sources of large-grain parallel work while maintaining per-processor locality. We implement the optimization within the Scale compiler framework, and analyze the performance of multiprocessor code produced for three microbenchmarks.
机译:对称共享内存多处理器(SMP)的有效利用取决于高效并行代码的开发。不幸的是,对于程序员来说,高效的并行性并不总是那么容易。更糟糕的是,利用这种并行性可能会直接影响到影响每个处理器利用率的优化(即通过循环重排序来改善数据局部性)。在这里,我们介绍由McKinley提出的针对SMP的循环级并行编译器优化的经验。该算法使用依赖性分析和目标计算机的简单模型来转换嵌套循环。该方法的目标是通过公开大粒度并行工作的源代码,同时保持每个处理器的局部性,从而促进并行循环的高效执行。我们在Scale编译器框架内实施优化,并分析针对三个微基准测试产生的多处理器代码的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号