首页> 外文会议>International conference on computer design >CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks

【24h】

CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks

机译：CNN-MERP：基于FPGA的内存高效可重构处理器，用于卷积神经网络的前向和后向传播

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Large-scale deep convolutional neural networks (CNNs) are widely used in machine learning applications. While CNNs involve huge complexity, VLSI (ASIC and FPGA) chips that deliver high-density integration of computational resources are regarded as a promising platform for CNN's implementation. At massive parallelism of computational units, however, the external memory bandwidth, which is constrained by the pin count of the VLSI chip, becomes the system bottleneck. Moreover, VLSI solutions are usually regarded as a lack of the flexibility to be reconfigured for the various parameters of CNNs. This paper presents CNN-MERP to address these issues. CNN-MERP incorporates an efficient memory hierarchy that significantly reduces the bandwidth requirements from multiple optimizations including on/off-chip data allocation, data flow optimization and data reuse. The proposed 2-level reconfigurability is utilized to enable fast and efficient reconfiguration, which is based on the control logic and the multiboot feature of FPGA. As a result, an external memory bandwidth requirement of 1.94MB/GFlop is achieved, which is 55% lower than prior arts. Under limited DRAM bandwidth, a system throughput of 1244GFlop/s is achieved at the Vertex UltraScale platform, which is 5.48 times higher than the state-of-the-art FPGA implementations.

机译：大型深度卷积神经网络（CNNS）广泛用于机器学习应用。虽然CNN涉及巨大的复杂性，但是将高密度计算资源集成的VLSI（ASIC和FPGA）芯片被视为CNN实施的有希望的平台。然而，在计算单元的大规模平行中，由VLSI芯片的引脚计数约束的外部存储器带宽成为系统瓶颈。此外，VLSI解决方案通常被认为是用于CNN的各种参数的缺乏可重新配置的灵活性。本文介绍了CNN-MERP解决这些问题。 CNN-MERP包含一个有效的内存层次结构，可显着降低多个优化的带宽要求，包括开/异单元数据分配，数据流优化和数据重用。所提出的2级重新配置性用于实现快速有效的重新配置，该重新配置为基于控制逻辑和FPGA的多点特征。结果，实现了1.94MB / gflop的外部存储器带宽要求，其比现有技术低55％。在Limited DRAM带宽下，在顶点UltraScale平台上实现了1244gFlop / s的系统吞吐量，比最先进的FPGA实现高5.48倍。

著录项

来源
《International conference on computer design》|2016年|690p|共8页
会议地点
作者
Xushen Han; Dajiang Zhou; Shihao Wang; Shinji Kimura;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP36-532;
关键词
Kernel; Bandwidth; Memory management; Throughput; Field programmable gate arrays; Backpropagation; Neural networks;

机译：内核;带宽;内存管理;吞吐量;现场可编程门阵列;BackPropagation;神经网络;

相似文献

外文文献
中文文献
专利

1. MERP-CNN: A memory-efficient reconfigurable processor for convolutional neural networks based on FPGA [J] . Xushen HAN, Dajiang ZHOU, Shinji KIMURA 電子情報通信学会技術研究報告. VLSI設計技術. VLSI Design Technologies . 2016,第21期

机译：MERP-CNN：一种基于FPGA的卷积神经网络的内存高效可重配置处理器
2. OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks [J] . IEEE transactions on very large scale integration (VLSI) systems . 2020,第1期

机译：OPU：用于卷积神经网络的基于FPGA的覆盖处理器
3. An FPGA-Based Convolutional Neural Network Coprocessor [J] . Changpei Qiu, Xin’an Wang, Tianxia Zhao, Wireless communications & mobile computing . 2021,第a期

机译：基于FPGA的卷积神经网络协处理器
4. CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks [C] . Xushen Han, Dajiang Zhou, Shihao Wang, International conference on computer design . 2016

机译：CNN-MERP：基于FPGA的内存高效可重配置处理器，用于卷积神经网络的正向和反向传播
5. FPGA-based Accelerators for Convolutional Neural Networks on Embedded Devices [D] . Perera Miro, Jordi. 2020

机译：基于FPGA的嵌入式设备卷积神经网络的加速器
6. ReStoCNet: Residual Stochastic Binary Convolutional Spiking Neural Network for Memory-Efficient Neuromorphic Computing [O] . Gopalakrishnan Srinivasan, Kaushik Roy 2010

机译：ReStoCNet：记忆有效的神经形态计算的残差随机二进制卷积穗状神经网络
7. CNN-MERP: An FPGA-Based Memory-Efficient Reconfigurable Processor for Forward and Backward Propagation of Convolutional Neural Networks [O] . Han, Xushen, Zhou, Dajiang, Wang, Shihao, 2017

机译：CNN-mERp：基于FpGa的内存高效可重配置处理器卷积神经网络的前向和后向传播

CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks

摘要

著录项

相似文献

相关主题

期刊订阅