...
首页> 外文期刊>ACM transactions on reconfigurable technology and systems >Optimizing OpenCL-Based CNN Design on FPGA with Comprehensive Design Space Exploration and Collaborative Performance Modeling
【24h】

Optimizing OpenCL-Based CNN Design on FPGA with Comprehensive Design Space Exploration and Collaborative Performance Modeling

机译:用综合设计空间探索和协作性能建模优化FPGA的基于Opencl的CNN设计

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Recent success in applying convolutional neural networks (CNNs) to object detection and classification has sparked great interest in accelerating CNNs using hardware-like field-programmable gate arrays (FPGAs). However, finding an efficient FPGA design for a given CNN model and FPGA board is not trivial since a strong background in hardware design and detailed knowledge of the target board are required. In this work, we try to solve this problem by design space exploration with a collaborative framework. Our framework consists of three main parts: FPGA design generation, coarse-grained modeling, and fine-grained modeling. In the FPGA design generation, we propose a novel data structure, LoopTree, to capture the details of the FPGA design for CNN applications without writing down the source code. Different LoopTrees, which indicate different FPGA designs, are automatically generated in this process. A coarse-grained model will evaluate LoopTrees at the operation level, e.g., add, mult, and so on, so that the most efficient LoopTrees can be selected. A fine-grained model, which is based on the source code, will then refine the selected design in a cycle-accurate manner. A set of comprehensive OpenCL-based designs have been implemented on board to verify our framework. An average estimation error of 8.87% and 4.8% has been observed for our coarse-grained model and fine-grained model, respectively. This is much lower than the prevalent operation-statistics-based estimation, which is obtained according to a predefined formula for specific loop schedules.
机译:最近在将卷积神经网络(CNNS)应用于对象检测和分类的成功引发了利用硬件场可编程门阵列(FPGA)加速CNNS的极大兴趣。但是,由于硬件设计中的强大背景和目标板的详细知识,找到了给定CNN模型和FPGA板的有效FPGA设计并不是微不足道的。在这项工作中,我们尝试通过使用协作框架设计空间探索来解决这个问题。我们的框架由三种主要部分组成:FPGA设计发电,粗粒建模和细粒度建模。在FPGA设计生成中,我们提出了一种新颖的数据结构LoopTree,用于捕获CNN应用程序的FPGA设计的细节,而无需编写源代码。在此过程中会自动生成指示不同FPGA设计的不同LoopTrees。粗粒模型将在操作级别评估LoopTrees,例如,添加,Multi,等,以便可以选择最有效的LoopTrees。然后,基于源代码的细粒型模型将以周期准确的方式优化所选设计。在船上已经实施了一套全面的基于Opencl的设计来验证我们的框架。对于我们的粗粒模型和细粒度模型,已经观察到平均估计误差8.87%和4.8%。这远低于基于普遍的操作统计数据的估计,其根据特定循环调度的预定公式获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号