首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Improving HW/SW Adaptability for Accelerating CNNs on FPGAs Through A Dynamic/Static Co-Reconfiguration Approach
【24h】

Improving HW/SW Adaptability for Accelerating CNNs on FPGAs Through A Dynamic/Static Co-Reconfiguration Approach

机译:通过动态/静态共同重新配置方法改善CNNS在FPGA上加速CNN的适应性

获取原文
获取原文并翻译 | 示例
           

摘要

With the continuous evolution of Convolutional Neural Networks (CNNs) and the improvement of the computing capability of FPGAs, the deployment of CNN accelerator based on FPGA has become more and more popular in various computing scenarios. The key element of implementing these accelerators is to take full advantage of underlying hardware characteristics to adapt to the computational features of the software-level CNN model. To achieve this goal, however, previous designs mainly focus on the static hardware reconfiguration pattern, which is not flexible enough and can hardly make the accelerator architecture and the CNN features fully fit, resulting in inefficient computations and data communications. By leveraging the dynamic partial reconfiguration technology equipped in the modern FPGA devices, in this article, we propose a new accelerator architecture for implementing CNNs on FPGAs in which static and dynamic reconfigurabilities of the hardware are cooperatively utilized to maximize the acceleration efficiency. Based on this architecture, we further present a systematic design and optimization methodology for implementing the specific CNN model in the particular computing scenario, in which a static design space exploration method and a reinforcement learning-based decision method are proposed to obtain the optimal static hardware configuration and run-time reconfiguration strategy respectively. We evaluate our proposal by implementing three widely used CNN models, AlexNet, VGG16C, and ResNet34, on the Xilinx ZCU102 FPGA platform. Experimental results show that our implementations on average can achieve 683 GOPS under 16-bit fixed data type and 1.37 TOPS under 8-bit fixed data type for three targeted CNN models, and improve the computational density from 1.1x to 1.91x compared with previous implementations on the same type of FPGA platform.
机译:随着卷积神经网络(CNNS)的连续演变和FPGA的计算能力的提高,基于FPGA的CNN加速器部署在各种计算场景中变得越来越受欢迎。实现这些加速器的关键要素是充分利用基础硬件特征来适应软件级CNN模型的计算特征。然而,为了实现这一目标,以前的设计主要集中在静态硬件重新配置模式上,这不足以足够灵活,并且可以很难使CNN具有完全适合的加速器架构,导致计算和数据通信。通过利用现代FPGA器件中配备的动态部分重新配置技术,在本文中,我们提出了一种新的加速器架构,用于在FPGA上实现CNN,其中硬件的静态和动态重新配置是协作地利用的,以最大化加速效率。基于该架构,我们进一步提出了一种系统的设计和优化方法,用于在特定计算场景中实现特定的CNN模型,其中提出了一种静态设计空间探索方法和基于增强基于学习的决策方法来获得最佳静态硬件配置和运行时重新配置策略。我们通过在Xilinx ZCU102 FPGA平台上实施三种广泛使用的CNN型号,AlexNet,VGG16C和Resnet34来评估我们的提案。实验结果表明,我们平均的实现可以实现16位固定数据类型的683个GOP,并在8位固定数据类型下为三个目标CNN型号,并与先前的实施相比,将计算密度提高到1.1x至1.91x。在同一类型的FPGA平台上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号