首页> 外文期刊>Embedded Systems Letters, IEEE >Improving Memory Utilization in Convolutional Neural Network Accelerators
【24h】

Improving Memory Utilization in Convolutional Neural Network Accelerators

机译:提高卷积神经网络加速器的内存利用

获取原文
获取原文并翻译 | 示例
           

摘要

While the accuracy of convolutional neural networks (CNNs) has achieved vast improvements by introducing larger and deeper network architectures, also the memory footprint for storing their parameters and activations has increased. This trend especially challenges power- and resource-limited accelerator designs, which are often restricted to store all network data in on-chip memory to avoid interfacing energy-hungry external memories. Maximizing the network size that fits on a given accelerator thus requires to maximize its memory utilization. While the traditionally used ping-pong buffering technique is mapping subsequent activation layers to disjunctive memory regions, we propose a mapping method that allows these regions to overlap and thus utilize the memory more efficiently. This letter presents the mathematical model to compute the maximum activations memory overlap and thus the lower bound of on-chip memory needed to perform layer-by-layer processing of CNNs on memory-limited accelerators. Our experiments with various real-world object detector networks show that the proposed mapping technique can decrease the activations memory by up to 32.9%, reducing the overall memory for the entire network by up to 23.9% compared to traditional ping-pong buffering. For higher resolution denoising networks, we achieve activation memory savings of 48.8%. Additionally, we implement a face detector network on a field-programmable gate array-based camera to validate these memory savings on a complete end-to-end system.
机译:虽然卷积神经网络(CNNS)的准确性通过引入较大和更深的网络架构而实现了巨大的改进,但存储其参数和激活的存储空间增加了增加。这种趋势尤其是挑战的功率和资源限制的加速器设计,这些设计通常仅限于将所有网络数据存储在片上存储器中,以避免接口能量饥饿的外部存储器。因此,最大化适合给定加速器的网络大小需要最大化其存储器利用率。虽然传统使用的ping-ping-pong缓冲技术正在将后续的激活层映射到分解存储区域,但是我们提出了一种映射方法,其允许这些区域重叠并因此更有效地利用存储器。这封信呈现数学模型来计算最大激活存储器重叠,因此在内存限制的加速器上执行CNN的层次处理所需的片上存储器的下限。我们具有各种真实对象探测器网络的实验表明,与传统的平乒乓缓冲相比,所提出的映射技术可以将激活存储器降低高达32.9%,将整个网络的整体内存减少到23.9%。对于更高分辨率的去噪网络,我们达到了48.8%的激活记忆。此外,我们在基于现场可编程门阵列的相机上实施面部检测器网络,以验证在完整的端到端系统上的这些内存节省。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号