首页> 中文期刊> 《软件学报》 >ParaC:面向GPU平台的图像处理领域的编程框架

ParaC:面向GPU平台的图像处理领域的编程框架

         

摘要

GPGPU加速器是当前提高图像处理算法性能的主流加速平台,但在GPGPU平台上,同一个程序充分利用硬件体系结构特征和软件特征的优化版本与简单实现版本在性能上会有数量级的差异.GPGPU加速器具有多维多层的大量执行线程和层次化存储体系结构,后者的不同层次具有不同的容量、带宽、延迟和访问权限.同时,图像处理应用程序具有复杂的计算操作、边界处理规则和数据访问特性.因此,任务的并发执行模式、线程的组织方式和并发任务到设备的映射不仅影响到程序的并发度、调度、通信和同步等特性,而且也会影响到访存的带宽、延迟等.因此,GPGPU平台上的程序优化是一个困难、复杂且效率较低的过程.提出基于语言扩展的领域编程模型:ParaC.ParaC编程环境利用高层语言扩展描述的程序语义信息,自动分析获取应用程序的操作信息、并发任务间的数据重用信息和访存信息等程序特征,同时结合硬件平台特征,利用基于领域先验知识驱动的编译优化模型自动生成GPGPU平台上的优化代码,最后,利用源源变换编译器生成标准OpenCL程序.在测试用例上的实验结果表明,ParaC在GPGPU平台上自动生成的优化版本相对于手工优化版本的加速比最高达到3.22倍,但代码行数只是后者的1.2%~39.68%.%Image processing algorithms take the GPU accelerators as the main speedup solution.However,the performance difference between a na(i)ve implementation and a highly optimized one on the same GPU accelerators is frequently an order of magnitude or more.The GPGPU platform features complicated hardware architecture characteristics,such as the large amount of multi-dimension and multi-level threads and the deep hierarchy memory system,while the different part of the latter features different capacity,bandwidth,latency and access authority.Additionally,image processing algorithms have complex operations,border data accessing rules and memory accessing patterns.Therefore,parallel execution model of tasks,organization of threads and parallel tasks to device mapping not only have big impact on the scalability,scheduling,communication and synchronization,but also affect the efficiency of memory accessing.In a word,the algorithm optimization methods on GPGPU platforms are difficult,complicated and less efficient.This paper proposes a domain specific language,ParaC,which can provide high level program semantics through the new language extensions.It obtains the applications' software characteristics,such as the operation information,the data reuse among parallel tasks and the memory access patterns,along with hardware platform information and the domain pre-knowledge driven optimization mechanism,to generate high performance GPGPU code automatically.The source-to-source compiler is then used to output the standard OpenCL programs.Experiment results on test cases show that ParaC automatically generated optimization version has gained 3.22 speedup compared to the hand-tuned version for the best case,while the number of lines of the former is just 1.2% to 39.68% of the latter.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号