...
首页> 外文期刊>International journal of parallel programming >Dynamic and Speculative Polyhedral Parallelization Using Compiler-Generated Skeletons
【24h】

Dynamic and Speculative Polyhedral Parallelization Using Compiler-Generated Skeletons

机译:使用编译器生成的骨骼进行动态和推测性多面体并行化

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We propose a framework based on an original generation and use of algorithmic skeletons, and dedicated to speculative parallelization of scientific nested loop kernels, able to apply at run-time polyhedral transformations to the target code in order to exhibit parallelism and data locality. Parallel code generation is achieved almost at no cost by using binary algorithmic skeletons that are generated at compile-time, and that embed the original code and operations devoted to instantiate a polyhedral parallelizing transformation and to verify the speculations on dependences. The skeletons are patched at run-time to generate the executable code. The run-time process includes a transformation selection guided by online profiling phases on short samples, using an instrumented version of the code. During this phase, the accessed memory addresses are used to compute on-the-fly dependence distance vectors, and are also interpolated to build a predictor of the forthcoming accesses. Interpolating functions and distance vectors are then employed for dependence analysis to select a parallelizing transformation that, if the prediction is correct, does not induce any rollback during execution. In order to ensure that the rollback time overhead stays low, the code is executed in successive slices of the outermost original loop of the nest. Each slice can be either a parallel version which instantiates a skeleton, a sequential original version, or an instrumented version. Moreover, such slicing of the execution provides the opportunity of transforming differently the code to adapt to the observed execution phases, by patching differently one of the pre-built skeletons. The framework has been implemented with extensions of the LLVM compiler and an x86-64 runtime system. Significant speed-ups are shown on a set of benchmarks that could not have been handled efficiently by a compiler.
机译:我们提出了一个基于算法框架的原始生成和使用的框架,并致力于科学嵌套循环内核的推测并行化,该框架能够在运行时对目标代码进行多面体转换,以表现出并行性和数据局部性。通过使用在编译时生成的二进制算法框架几乎可以免费实现并行代码生成,这些框架嵌入了原始代码和专用于实例化多面体并行化转换并验证对依赖关系的推测的操作。在运行时对框架进行修补以生成可执行代码。运行时过程包括转换的选择,该转换选择由使用代码的工具版本的简短样本进行在线概要分析阶段指导。在此阶段,所访问的存储器地址用于计算动态依赖距离矢量,并且也进行插值以构建即将发生的访问的预测器。然后,将插值函数和距离向量用于依赖关系分析,以选择并行化转换,如果预测正确,则在执行过程中不会引起任何回滚。为了确保回滚时间开销保持较低,在嵌套的最外层原始循环的连续切片中执行代码。每个切片可以是实例化骨架的并行版本,顺序的原始版本或检测的版本。此外,执行的这种切片提供了机会,即通过不同地修补一个预先构建的框架之一,可以不同地转换代码以适应观察到的执行阶段。该框架已通过LLVM编译器和x86-64运行时系统的扩展实现。大量基准测试显示了一组编译器无法有效处理的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号