首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >Toward an Efficient Deep Pipelined Template-Based Architecture for Accelerating the Entire 2-D and 3-D CNNs on FPGA
【24h】

Toward an Efficient Deep Pipelined Template-Based Architecture for Accelerating the Entire 2-D and 3-D CNNs on FPGA

机译:朝着高效的基于流水线模板的架构,用于加速FPGA上的整个2-D和3-D CNN

获取原文
获取原文并翻译 | 示例
           

摘要

3-D convolutional neural networks (3-D CNNs) are used efficiently in many computer vision applications. Most previous work in this area has concentrated only on design and optimization of accelerators for 2-D CNNs, with few attempts having been made to accelerate 3-D CNNs on FPGA. We find the acceleration of 3-D CNNs on FPGA to be challenging due to their high computational complexity and storage demands. More importantly, although the computational patterns of 2-D and 3-D CNNs are analogous, the conventional approaches that have been adopted for acceleration of 2-D CNNs may be unfit for 3-D CNN acceleration. In this paper, in order to accelerate 2-D and 3-D CNNs using a uniform framework, we first propose a uniform template-based architecture that uses templates based on the Winograd algorithm to ensure the rapid development of 2-D and 3-D CNN accelerators. Then, with the aim of efficiently mapping all layers of 2-D/3-D CNNs onto a pipelined accelerator, techniques are developed to improve the throughput and computational efficiency of the accelerator, including layer fusion, layer clustering, and workload-balancing scheme. Finally, we demonstrate the effectiveness of the deep pipelined architecture by accelerating real-life 2-D and 3-D CNNs on the state-of-the-art FPGA platform. On VCU118, we achieve 3.7 TOPS for VGG-16, which outperforms state-of-the-art FPGA-based CNN accelerators. Comparisons with CPU and GPU solutions demonstrate that our implementation of 3-D CNN achieves gains of up to 17.8x and 64.2x in performance and energy relative to a CPU solution, and a 5.0x energy efficiency gain over a GPU solution.
机译:在许多计算机视觉应用中有效地使用3-D卷积神经网络(3-D CNNS)。该地区最先前的工作仅集中在2-D CNNS的加速器的设计和优化,几次尝试加速FPGA上的3-D CNN。由于其高计算复杂性和存储需求,我们发现FPGA上的3-D CNNS的加速度挑战。更重要的是,尽管2-D和3-D CNN的计算模式类似,但是已经采用的用于加速2-D CNN的传统方法可能是不合适的3-D CNN加速度。在本文中,为了使用统一框架加速2-D和3-D CNN,我们首先提出了一种统一的基于模板的架构,它使用基于WinoGrad算法的模板,以确保2-D和3-的快速发展D CNN加速器。然后,为了有效地将2-D / 3-D CNN的层映射到流水线加速器上,开发了技术以提高加速器的吞吐量和计算效率,包括层融合,层聚类和工作负载平衡方案。最后,我们通过在最先进的FPGA平台上加速现实生活2-D和3-D CNN来证明深度流水线架构的有效性。在VCU118上,我们为VGG-16实现了3.7个顶部,这优于基于最先进的FPGA的CNN加速器。与CPU和GPU解决方案的比较表明,我们的3-D CNN的实现可实现高达17.8倍和64.2倍的性能和能量相对于CPU解决方案的增益,以及GPU解决方案的5.0倍的能效增益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号