...
首页> 外文期刊>IEEE Journal of Solid-State Circuits >An Energy-Efficient Deep Convolutional Neural Network Training Accelerator for In Situ Personalization on Smart Devices
【24h】

An Energy-Efficient Deep Convolutional Neural Network Training Accelerator for In Situ Personalization on Smart Devices

机译:智能设备上的智能设备原位个性化的节能深度卷积神经网络培训加速器

获取原文
获取原文并翻译 | 示例
           

摘要

A scalable deep-learning accelerator supporting the training process is implemented for device personalization of deep convolutional neural networks (CNNs). It consists of three processor cores operating with distinct energy-efficient dataflow for different types of computation in CNN training. Unlike the previous works where they implement design techniques to exploit the same characteristics from the inference, we analyze major issues that occurred from training in a resource-constrained system to resolve the bottlenecks. A masking scheme in the propagation core reduces a massive amount of intermediate activation data storage. It eliminates frequent off-chip memory accesses for holding the generated activation data until the backward path. A disparate dataflow architecture is implemented for the weight gradient computation to enhance PE utilization while maximally reuse the input data. Furthermore, the modified weight update system enables an 8-bit fixed-point computing datapath. The processor is implemented in 65-nm CMOS technology and occupies 10.24 mm(2) of the core area. It operates with the supply voltage from 0.63 to 1.0 V, and the computing engine runs in near-threshold voltage of 0.5 V. The chip consumes 40.7 mW at 50 MHz with the highest efficiency and achieves 47.4 mu J/epoch of training efficiency for the customized CNN model.
机译:支持培训过程的可扩展深度学习加速器,用于设备个性化的深度卷积神经网络(CNNS)。它由三个处理器内核组成,用于CNN培训中不同类型的计算不同的节能数据流。与以前的作品不同,在那里实现设计技术从推理中利用相同的特征,我们分析了在资源受限系统中训练中发生的重大问题来解决瓶颈。传播核中的掩蔽方案降低了大量的中间激活数据存储。它消除了频繁的片外存储器访问,用于保持生成的激活数据,直到向后路径。实现了一个不同的DataFlow架构,用于重量梯度计算,以提高PE利用率,同时最大地重用输入数据。此外,修改的权重更新系统启用8位定点计算数据路径。处理器以65nm CMOS技术实现,占核心区域的10.24mm(2)。它以0.63至1.0V的电源电压运行,计算发动机在近阈值电压下运行0.5 V.芯片消耗40.7兆瓦,50 MHz,效率最高,实现了47.4亩J /纪元的培训效率定制的CNN模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号