End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet?

机译：端到端100-TOPS / W推理模拟内存计算：我们是否有？

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In-Memory Acceleration (IMA) promises major efficiency improvements in deep neural network (DNN) inference, but challenges remain in the integration of IMA within a digital system. We propose a heterogeneous architecture coupling 8 RISC-V cores with an IMA in a shared-memory cluster, analyzing the benefits and trade-offs of in-memory computing on the realistic use case of a MobileNetV2 bottleneck layer. We explore several IMA integration strategies, analyzing performance, area, and energy efficiency. We show that while pointwise layers achieve significant speed-ups over software implementation, on depthwise layer the inability to efficiently map parameters on the accelerator leads to a significant trade-off between throughput and area. We propose a hybrid solution where pointwise convolutions are executed on IMA while depthwise on the cluster cores, achieving a speed-up of 3x over SW execution while saving 50% of area when compared to an all-in IMA solution with similar performance.

机译：内部内存加速度（IMA）承诺在深神经网络（DNN）推断中的主要效率提高，但挑战仍然存在于数字系统中的IMA的集成。我们提出了一种异构架构，在共享存储器集群中用IMA提出了一个异构的架构耦合8 risc-v核心，分析了MobileNetv2瓶颈层的现实用例中内存计算的益处和权衡。我们探讨了几种IMA集成策略，分析了性能，区域和能效。我们表明，虽然点层在软件实现上实现了显着的速度，但在深度层上，无法有效地在加速器上映射参数，导致吞吐量和区域之间的显着折衷。我们提出了一种混合解决方案，其中在IMA上在IMA上执行了一个混合解决方案，同时深入在群集核心上，在与具有类似性能的All-IMA解决方案相比，在SW执行上实现3倍的加速。

著录项

来源
《IEEE International Conference on Artificial Intelligence Circuits and Systems》|2021年|1-4|共4页
会议地点
作者
Gianmarco Ottavi; Geethan Karunaratne; Francesco Conti; Irem Boybat; Luca Benini; Davide Rossi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Couplings; Digital systems; Circuits and systems; Conferences; Neural networks; Computer architecture; Throughput;

机译：联轴器;数字系统;电路和系统;会议;神经网络;计算机架构;吞吐量;

相似文献

外文文献
中文文献
专利

1. IGZO-Based Compute Cell for Analog In-Memory Computing—DTCO Analysis to Enable Ultralow-Power AI at Edge [J] . D. Saito, J. Doevenspeck, S. Cosemans, Electron Devices, IEEE Transactions on . 2020,第11期

机译：基于IGZO的计算单元，用于模拟内存计算-DTCO分析，以在边缘启用UltraLow-Power Ai
2. CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and Precision-Programmable CNN Inference [J] . Chen Zhiyu, Yu Zhanghao, Jin Qing, IEEE Journal of Solid-State Circuits . 2021,第6期

机译：CAP-RAM：用于准确和精确可编程CNN推理的电荷域内存计算6T-SRAM
3. Tightly Coupled Machine Learning Coprocessor Architecture With Analog In-Memory Computing for Instruction-Level Acceleration [J] . Chung SungWon, Wang Jiemi Emerging and Selected Topics in Circuits and Systems, IEEE Journal on . 2019,第3期

机译：紧密耦合的机器学习协处理器架构与模拟内存计算，可实现指令级加速
4. An In-Memory Analog Computing Co-Processor for Energy-Efficient CNN Inference on Mobile Devices [C] . Mohammed Elbtity, Abhishek Singh, Brendan Reidy, IEEE Computer Society Annual Symposium on VLSI . 2021

机译：用于移动设备上的节能CNN推理的内存模拟计算协同处理器
5. Satisfying end-to-end quality of service requirements with end-to-end performance inference technique. [D] . Feng, Benjamin Zhong Ming. 2009

机译：使用端到端性能推断技术满足端到端服务质量要求。
6. Accelerating Inference of Convolutional Neural Networks Using In-memory Computing [O] . Martino Dazzi, Abu Sebastian, Luca Benini, 2021

机译：使用内存计算加速卷积神经网络的推断
7. Characterization and Programming Algorithm of Phase Change Memory Cells for Analog In-Memory Computing [O] . Alessio Antolini, Eleonora Franchi Scarselli, Antonio Gnudi, 2021

机译：模拟内存计算相变存储器单元的表征与编程算法
8. Computing Science. A Non-Blocking Atomic-Multicast Service for Scalable In-Memory Transaction Systems. [R] . Emerson, R., Ezhilchelvan, P. 2014

机译：计算科学。用于可扩展的内存中事务系统的非阻塞原子多播服务。

End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet?

摘要

著录项

相似文献

相关主题

期刊订阅