Improving Memory Utilization in Convolutional Neural Network Accelerators

Jokic Petar; Emery Stephane; Benini Luca

首页> 外文期刊>Embedded Systems Letters, IEEE >Improving Memory Utilization in Convolutional Neural Network Accelerators

【24h】

Improving Memory Utilization in Convolutional Neural Network Accelerators

机译：提高卷积神经网络加速器的内存利用

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

While the accuracy of convolutional neural networks (CNNs) has achieved vast improvements by introducing larger and deeper network architectures, also the memory footprint for storing their parameters and activations has increased. This trend especially challenges power- and resource-limited accelerator designs, which are often restricted to store all network data in on-chip memory to avoid interfacing energy-hungry external memories. Maximizing the network size that fits on a given accelerator thus requires to maximize its memory utilization. While the traditionally used ping-pong buffering technique is mapping subsequent activation layers to disjunctive memory regions, we propose a mapping method that allows these regions to overlap and thus utilize the memory more efficiently. This letter presents the mathematical model to compute the maximum activations memory overlap and thus the lower bound of on-chip memory needed to perform layer-by-layer processing of CNNs on memory-limited accelerators. Our experiments with various real-world object detector networks show that the proposed mapping technique can decrease the activations memory by up to 32.9%, reducing the overall memory for the entire network by up to 23.9% compared to traditional ping-pong buffering. For higher resolution denoising networks, we achieve activation memory savings of 48.8%. Additionally, we implement a face detector network on a field-programmable gate array-based camera to validate these memory savings on a complete end-to-end system.

机译：虽然卷积神经网络（CNNS）的准确性通过引入较大和更深的网络架构而实现了巨大的改进，但存储其参数和激活的存储空间增加了增加。这种趋势尤其是挑战的功率和资源限制的加速器设计，这些设计通常仅限于将所有网络数据存储在片上存储器中，以避免接口能量饥饿的外部存储器。因此，最大化适合给定加速器的网络大小需要最大化其存储器利用率。虽然传统使用的ping-ping-pong缓冲技术正在将后续的激活层映射到分解存储区域，但是我们提出了一种映射方法，其允许这些区域重叠并因此更有效地利用存储器。这封信呈现数学模型来计算最大激活存储器重叠，因此在内存限制的加速器上执行CNN的层次处理所需的片上存储器的下限。我们具有各种真实对象探测器网络的实验表明，与传统的平乒乓缓冲相比，所提出的映射技术可以将激活存储器降低高达32.9％，将整个网络的整体内存减少到23.9％。对于更高分辨率的去噪网络，我们达到了48.8％的激活记忆。此外，我们在基于现场可编程门阵列的相机上实施面部检测器网络，以验证在完整的端到端系统上的这些内存节省。

著录项

来源
《Embedded Systems Letters, IEEE》 |2021年第3期|77-80|共4页
作者
Jokic Petar; Emery Stephane; Benini Luca;
展开▼
作者单位

Swiss Fed Inst Technol Integrated Syst Lab Swiss Fed Inst Technol CH-8092 Zurich Switzerland|CSEM SA Syst On Chip Dept CH-8005 Zurich Switzerland;

CSEM SA Syst On Chip Dept CH-8005 Zurich Switzerland;

Swiss Fed Inst Technol Integrated Syst Lab Swiss Fed Inst Technol CH-8092 Zurich Switzerland|Univ Bologna Dept Elect Elect & Informat Engn I-40126 Bologna Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Memory management; Microsoft Windows; Kernel; Convolutional neural networks; System-on-chip; Detectors; Convolution; Convolutional neural networks (CNNs); hardware accelerator; lower bound; memory requirements;

机译：内存管理;微软Windows;内核;卷积神经网络;片上系统;探测器;卷积;卷积神经网络（CNN）;硬件加速器;下限;记忆要求;

相似文献

外文文献
中文文献
专利

1. Inversion of Oceanic Parameters Represented by CTD Utilizing Seismic Multi-Attributes Based on Convolutional Neural Network [J] . AN Zhenfang, ZHANG Jin, XING Lei 中国海洋大学学报（英文版） . 2020,第006期
2. Object Detection Research of SAR Image Using Improved Faster Region-Based Convolutional Neural Network [J] . Long SUN, Tao WU, Guangcai SUN, 测绘学报(英文版) . 2020,第003期
3. Object Detection Research of SAR Image Using Improved Faster Region-Based Convolutional Neural Network [J] . Long SUN, Tao WU, Guangcai SUN, 测绘学报（英文） . 2020,第003期
4. Enhancing Utilization of SIMD-Like Accelerator for Sparse Convolutional Neural Networks [J] . Lai Bo-Cheng, Pan Jyun-Wei, Lin Chien-Yu IEEE transactions on very large scale integration (VLSI) systems . 2019,第5期

机译：加强利用SIMD加速器的稀疏卷积神经网络
5. Enhancing Utilization of SIMD-Like Accelerator for Sparse Convolutional Neural Networks [J] . Lai Bo-Cheng, Pan Jyun-Wei, Lin Chien-Yu IEEE transactions on very large scale integration (VLSI) systems . 2019,第5期

机译：类似于SIMD的加速器在稀疏卷积神经网络中的利用
6. An Energy-Efficient Deep Convolutional Neural Network Accelerator Featuring Conditional Computing and Low External Memory Access [J] . Kim Minkyu, Seo Jae-Sun IEEE Journal of Solid-State Circuits . 2021,第3期

机译：节能深度卷积神经网络加速器，具有条件计算和低外部存储器访问
7. High Utilization Energy-Aware Real-Time Inference Deep Convolutional Neural Network Accelerator [C] . Kuan-Ting Lin, Ching-Te Chiu, Jheng-Yi Chang, IEEE International Symposium on Circuits and Systems . 2021

机译：高利用能量感知实时推断深卷积神经网络加速器
8. Programmable Manycore Accelerator for Machine Learning, Convolution Neural Network and Binary Neural Network [D] . Kulkarni, Adwaya Amey. 2017

机译：面向机器学习，卷积神经网络和二进制神经网络的可编程Manycore加速器
9. A Novel Memory-Scheduling Strategy for Large Convolutional Neural Network on Memory-Limited Devices [O] . Shijie Li, Xiaolong Shen, Yong Dou, 2019

机译：内存受限设备上大卷积神经网络的一种新的内存调度策略
10. Improving Memory Utilization in Convolutional Neural Network Accelerators [O] . Petar Jokic, Stephane Emery, Luca Benini 2020

机译：提高卷积神经网络加速器中的内存利用
11. Toward Automated Aerial Refueling: Automated Visual Aircraft Identification with Convolutional Neural Networks. [R] . Mash, R. L. 2017

机译：迈向自动空中加油：利用卷积神经网络进行自动视觉飞行识别。

Improving Memory Utilization in Convolutional Neural Network Accelerators

摘要

著录项

相似文献

相关主题

期刊订阅