ParaC:面向GPU平台的图像处理领域的编程框架

卢兴敬; 刘雷; 贾海鹏; 冯晓兵; 武成岗

首页> 中文期刊> 《软件学报》 >ParaC:面向GPU平台的图像处理领域的编程框架

ParaC:面向GPU平台的图像处理领域的编程框架

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

GPGPU加速器是当前提高图像处理算法性能的主流加速平台,但在GPGPU平台上,同一个程序充分利用硬件体系结构特征和软件特征的优化版本与简单实现版本在性能上会有数量级的差异.GPGPU加速器具有多维多层的大量执行线程和层次化存储体系结构,后者的不同层次具有不同的容量、带宽、延迟和访问权限.同时,图像处理应用程序具有复杂的计算操作、边界处理规则和数据访问特性.因此,任务的并发执行模式、线程的组织方式和并发任务到设备的映射不仅影响到程序的并发度、调度、通信和同步等特性,而且也会影响到访存的带宽、延迟等.因此,GPGPU平台上的程序优化是一个困难、复杂且效率较低的过程.提出基于语言扩展的领域编程模型:ParaC.ParaC编程环境利用高层语言扩展描述的程序语义信息,自动分析获取应用程序的操作信息、并发任务间的数据重用信息和访存信息等程序特征,同时结合硬件平台特征,利用基于领域先验知识驱动的编译优化模型自动生成GPGPU平台上的优化代码,最后,利用源源变换编译器生成标准OpenCL程序.在测试用例上的实验结果表明,ParaC在GPGPU平台上自动生成的优化版本相对于手工优化版本的加速比最高达到3.22倍,但代码行数只是后者的1.2％～39.68％.%Image processing algorithms take the GPU accelerators as the main speedup solution.However,the performance difference between a na(i)ve implementation and a highly optimized one on the same GPU accelerators is frequently an order of magnitude or more.The GPGPU platform features complicated hardware architecture characteristics,such as the large amount of multi-dimension and multi-level threads and the deep hierarchy memory system,while the different part of the latter features different capacity,bandwidth,latency and access authority.Additionally,image processing algorithms have complex operations,border data accessing rules and memory accessing patterns.Therefore,parallel execution model of tasks,organization of threads and parallel tasks to device mapping not only have big impact on the scalability,scheduling,communication and synchronization,but also affect the efficiency of memory accessing.In a word,the algorithm optimization methods on GPGPU platforms are difficult,complicated and less efficient.This paper proposes a domain specific language,ParaC,which can provide high level program semantics through the new language extensions.It obtains the applications' software characteristics,such as the operation information,the data reuse among parallel tasks and the memory access patterns,along with hardware platform information and the domain pre-knowledge driven optimization mechanism,to generate high performance GPGPU code automatically.The source-to-source compiler is then used to output the standard OpenCL programs.Experiment results on test cases show that ParaC automatically generated optimization version has gained 3.22 speedup compared to the hand-tuned version for the best case,while the number of lines of the former is just 1.2％ to 39.68％ of the latter.

著录项

来源
《软件学报》 |2017年第7期|1655-1675|共21页
作者
卢兴敬; 刘雷; 贾海鹏; 冯晓兵; 武成岗;
展开▼
作者单位

体系结构国家重点实验室(中国科学院计算技术研究所);

北京100190;

中国科学院大学;

北京 100049;

体系结构国家重点实验室(中国科学院计算技术研究所);

北京100190;

体系结构国家重点实验室(中国科学院计算技术研究所);

北京100190;

体系结构国家重点实验室(中国科学院计算技术研究所);

北京100190;

体系结构国家重点实验室(中国科学院计算技术研究所);

北京100190;

展开▼
原文格式 PDF
正文语种 chi
中图分类编译程序、解释程序;
关键词
图像处理; 通用GPU加速器; 领域编程语言; 编译优化; 源源变换;

相似文献

中文文献
外文文献
专利

1. 面向节点异构GPU集群的编程框架 [J] . 盛冲冲 ,胡新明 ,李佳佳 . 计算机工程 . 2015,第002期
2. 面向多核CPU和GPU平台的数据库星形连接优化 [J] . 刘专 ,韩瑞琛 ,张延松 . 计算机应用 . 2021,第003期
3. 面向GPU平台的二维FFT的加速技术研究 [J] . 陈博伦 ,何卫锋 . 现代计算机（专业版） . 2020,第012期
4. 面向GPU平台的复杂网络core分解方法研究 [J] . 张珩 ,崔强 ,侯朋朋 . 软件学报 . 2020,第004期
5. 面向GPU计算平台的归约算法的性能优化研究 [J] . 张逸然 ,陈龙 ,安向哲 . 计算机科学 . 2019,第002期
6. 一种屏蔽异构的面向实际数值模拟应用的结构网格stencil计算领域编程框架 [C] . Yang Yang ,杨扬 ,Zhang Aiqing . 2015全国高性能计算学术年会 . 2015
7. 基于CPU-GPU异构平台的图像处理的加速研究 [A] . 宋展 . 2014

ParaC:面向GPU平台的图像处理领域的编程框架

摘要

著录项

相似文献

相关主题

期刊订阅