Improving HW/SW Adaptability for Accelerating CNNs on FPGAs Through A Dynamic/Static Co-Reconfiguration Approach

Gong Lei; Wang Chao; Li Xi; Zhou Xuehai

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Improving HW/SW Adaptability for Accelerating CNNs on FPGAs Through A Dynamic/Static Co-Reconfiguration Approach

【24h】

Improving HW/SW Adaptability for Accelerating CNNs on FPGAs Through A Dynamic/Static Co-Reconfiguration Approach

机译：通过动态/静态共同重新配置方法改善CNNS在FPGA上加速CNN的适应性

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the continuous evolution of Convolutional Neural Networks (CNNs) and the improvement of the computing capability of FPGAs, the deployment of CNN accelerator based on FPGA has become more and more popular in various computing scenarios. The key element of implementing these accelerators is to take full advantage of underlying hardware characteristics to adapt to the computational features of the software-level CNN model. To achieve this goal, however, previous designs mainly focus on the static hardware reconfiguration pattern, which is not flexible enough and can hardly make the accelerator architecture and the CNN features fully fit, resulting in inefficient computations and data communications. By leveraging the dynamic partial reconfiguration technology equipped in the modern FPGA devices, in this article, we propose a new accelerator architecture for implementing CNNs on FPGAs in which static and dynamic reconfigurabilities of the hardware are cooperatively utilized to maximize the acceleration efficiency. Based on this architecture, we further present a systematic design and optimization methodology for implementing the specific CNN model in the particular computing scenario, in which a static design space exploration method and a reinforcement learning-based decision method are proposed to obtain the optimal static hardware configuration and run-time reconfiguration strategy respectively. We evaluate our proposal by implementing three widely used CNN models, AlexNet, VGG16C, and ResNet34, on the Xilinx ZCU102 FPGA platform. Experimental results show that our implementations on average can achieve 683 GOPS under 16-bit fixed data type and 1.37 TOPS under 8-bit fixed data type for three targeted CNN models, and improve the computational density from 1.1x to 1.91x compared with previous implementations on the same type of FPGA platform.

机译：随着卷积神经网络（CNNS）的连续演变和FPGA的计算能力的提高，基于FPGA的CNN加速器部署在各种计算场景中变得越来越受欢迎。实现这些加速器的关键要素是充分利用基础硬件特征来适应软件级CNN模型的计算特征。然而，为了实现这一目标，以前的设计主要集中在静态硬件重新配置模式上，这不足以足够灵活，并且可以很难使CNN具有完全适合的加速器架构，导致计算和数据通信。通过利用现代FPGA器件中配备的动态部分重新配置技术，在本文中，我们提出了一种新的加速器架构，用于在FPGA上实现CNN，其中硬件的静态和动态重新配置是协作地利用的，以最大化加速效率。基于该架构，我们进一步提出了一种系统的设计和优化方法，用于在特定计算场景中实现特定的CNN模型，其中提出了一种静态设计空间探索方法和基于增强基于学习的决策方法来获得最佳静态硬件配置和运行时重新配置策略。我们通过在Xilinx ZCU102 FPGA平台上实施三种广泛使用的CNN型号，AlexNet，VGG16C和Resnet34来评估我们的提案。实验结果表明，我们平均的实现可以实现16位固定数据类型的683个GOP，并在8位固定数据类型下为三个目标CNN型号，并与先前的实施相比，将计算密度提高到1.1x至1.91x。在同一类型的FPGA平台上。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2021年第7期|1854-1865|共12页
作者
Gong Lei; Wang Chao; Li Xi; Zhou Xuehai;
展开▼
作者单位

Univ Sci & Technol China Sch Comp Sci Hefei 230027 Anhui Peoples R China;

Univ Sci & Technol China Sch Comp Sci Hefei 230027 Anhui Peoples R China;

Univ Sci & Technol China Sch Comp Sci Hefei 230027 Anhui Peoples R China;

Univ Sci & Technol China Sch Comp Sci Hefei 230027 Anhui Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Hardware; Field programmable gate arrays; Computational modeling; Acceleration; Mathematical model; Convolution; Accelerator architectures; Convolutional neural networks; FPGA; hardware accelerator; computing adapativity; dynamic partial reconfiguration;

机译：硬件;现场可编程门阵列;计算建模;加速;数学模型;卷积;加速器架构;卷积神经网络;FPGA;硬件加速器;计算适应性;动态部分重新配置;动态部分重新配置;动态部分重新配置;动态部分重新配置;动态部分重新配置;

相似文献

外文文献
中文文献
专利

1. A Channel-based Communication/Synchronization Model for SW-HW Multitasking on Dynamically Partially Reconfigurable FPGAs [J] . Krzysztof Jozwik, Shinya Honda, Masato Edahiro, 電子情報通信学会技術研究報告 . 2013,第376期

机译：基于通道的通信/同步模型，用于动态部分可重新配置的FPGA上的SW-HW多任务
2. A Channel-based Communication/Synchronization Model for SW-HW Multitasking on Dynamically Partially Reconfigurable FPGAs [J] . Krzysztof Jozwik, Shinya Honda, Masato Edahiro, 電子情報通信学会技術研究報告 . 2013,第375期

机译：基于通道的通信/同步模型，用于动态部分可重新配置的FPGA上的SW-HW多任务
3. A Channel-based Communication/Synchronization Model for SW-HW Multitasking on Dynamically Partially Reconfigurable FPGAs [J] . Krzysztof Jozwik, Shinya Honda, Masato Edahiro, 電子情報通信学会技術研究報告. リコンフィギャラブルシステム. Reconfigurable Systems . 2012,第377期

机译：基于通道的通信/同步模型，用于动态部分可重新配置的FPGA上的SW-HW多任务
4. AcENoCs: A Configurable HW/SW Platform for FPGA Accelerated NoC Emulation [C] . Lotlikar S., Pai V., Gratz P.V. 24th International Conference on VLSI Design . 2011

机译：AcENoC：用于FPGA加速NoC仿真的可配置硬件/软件平台
5. Flexible HW-SW Design and Analysis of an MMT-based MANET System on FPGA. [D] . Yi, Huixiang. 2013

机译：在FPGA上基于MMT的MANET系统的灵活HW-SW设计和分析。
6. Extrapolating the effect of deleterious nsSNPs in the binding adaptability of flavopiridol with CDK7 protein: a molecular dynamics approach [O] . C George Priya Doss, N Nagasundaram, Chiranjib Chakraborty, 2013

机译：推断有害的nsSNPs对黄酮哌啶醇与CDK7蛋白的结合适应性的影响：分子动力学方法
7. Speculative dynamic vectorization to assist static vectorization in a HW/SW co-designed environment [O] . Kumar, R., Martinez, A., Gonzalez, A. 2013

机译：推测动态矢量化以辅助HW / sW共同设计环境中的静态矢量化

Improving HW/SW Adaptability for Accelerating CNNs on FPGAs Through A Dynamic/Static Co-Reconfiguration Approach

摘要

著录项

相似文献

相关主题

期刊订阅