首页> 外文会议>IEEE International Symposium on Computer Architecture and High Performance Computing >Automatic Insertion of Copy Annotation in Data-Parallel Programs
【24h】

Automatic Insertion of Copy Annotation in Data-Parallel Programs

机译:在数据并行程序中自动插入复制注释

获取原文

摘要

Directive-based programming models, such as OpenACC and OpenMP arise today as promising techniques to support the development of parallel applications. These systems allow developers to convert a sequential program into a parallel one with minimum human intervention. However, inserting pragmas into production code is a difficult and error-prone task, often requiring familiarity with the target program. This difficulty restricts the ability of developers to annotate code that they have not written themselves. This paper provides one fundamental component in the solution of this problem. We introduce a static program analysis that infers the bounds of memory regions referenced in source code. Such bounds allow us to automatically insert data-transfer primitives, which are needed when the parallelized code is meant to be executed in an accelerator device, such as a GPU. To validate our ideas, we have applied them onto Polybench, using two different architectures: Nvidia and Qualcomm-based. We have successfully analyzed 98% of all the memory accesses in Polybench. This result has enabled us to insert automatic annotations into those benchmarks leading to speedups of over 100x.
机译:如今,基于指令的编程模型(例如OpenACC和OpenMP)成为支持并行应用程序开发的有前途的技术。这些系统允许开发人员以最少的人工干预将顺序程序转换为并行程序。但是,将编译指示插入生产代码中是一项困难且容易出错的任务,通常需要熟悉目标程序。这个困难限制了开发人员注释自己尚未编写的代码的能力。本文提供了解决此问题的一个基本组成部分。我们介绍了一种静态程序分析,该分析可以推断源代码中引用的内存区域的边界。这样的界限使我们能够自动插入数据传输原语,当并行化代码要在诸如GPU之类的加速器设备中执行时,这是必需的。为了验证我们的想法,我们使用两种不同的体系结构将它们应用于Polybench:Nvidia和基于Qualcomm。我们已经成功地分析了Polybench中98%的所有内存访问。这一结果使我们能够在这些基准测试中插入自动注释,从而使速度提高100倍以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号