Logarithm-approximate floating-point multiplier is applicable to power-efficient neural network training

Cheng TaiYu; Masuda Yukata; Chen Jun; Yu Jaehoon; Hashimoto Masanori

首页> 外文期刊>Integration >Logarithm-approximate floating-point multiplier is applicable to power-efficient neural network training

【24h】

Logarithm-approximate floating-point multiplier is applicable to power-efficient neural network training

机译：对数 - 近似浮点倍增器适用于高效的神经网络培训

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, emerging "edge computing" moves data and services from the cloud to nearby edge servers to achieve short latency and wide bandwidth, and solve privacy concerns. However, edge servers, often embedded with GPU processors, highly demand a solution for power-efficient neural network (NN) training due to the limitation of power and size. Besides, according to the nature of the broad dynamic range of gradient values computed in NN training, floating-point representation is more suitable. This paper proposes to adopt a logarithm-approximate multiplier (LAM) for multiply-accumulate (MAC) computation in neural network (NN) training engines, where LAM approximates a floating-point multiplication as a fixed-point addition, resulting in smaller delay, fewer gates, and lower power consumption. We demonstrate the efficiency of LAM in two platforms, which are dedicated NN training hardware, and open-source GPU design. Compared to the NN training applying the exact multiplier, our implementation of the NN training engine for a 2-D classification dataset achieves 10% speed-up and 2.3X efficiency improvement in power and area, respectively. LAM is also highly compatible with conventional bit-width scaling (BWS). When BWS is applied with LAM in five test datasets, the implemented training engines achieve more than 4.9X power efficiency improvement, with at most 1% accuracy degradation, where 2.2X improvement originates from LAM. Also, the advantage of LAM can be exploited in processors. A GPU design embedded with LAM executing an NN-training workload, which is implemented in an FPGA, presents 1.32X power efficiency improvement, and the improvement reaches 1.54X with LAM + BWS. Finally, LAM-based training in deeper NN is evaluated. Up to 4-hidden layer NN, LAM-based training achieves highly comparable accuracy as that of the accurate multiplier, even with aggressive BWS.

机译：最近，新兴的“Edge Computing”将数据和服务从云移动到附近的边缘服务器，以实现短期延迟和宽带宽度，并解决隐私问题。但是，通常嵌入GPU处理器的边缘服务器，由于功率和尺寸的限制，高度要求提供高效神经网络（NN）训练的解决方案。此外，根据NN训练中计算的梯度值的宽度动态值的性质，浮点表示更适合。本文提出采用神经网络（NN）训练引擎中的乘法累积（MAC）计算的对数近似乘数（LAM），其中LAM近似于浮点乘法作为固定点添加，导致较小的延迟，较少的盖茨和较低的功耗。我们展示了两个平台中LAM的效率，这些平台是专用的NN培训硬件，以及开源GPU设计。与应用精确乘法器的NN训练相比，我们为2-D分类数据集的NN训练引擎的实现分别实现了10％的加速和2.3倍的功率和面积效率。林也与传统的钻头宽度缩放（BWS）高度兼容。当BWS在五个测试数据集中用LAM应用时，所实施的训练发动机实现了4.9倍的功率效率改进，最多为1％的精度降解，其中2.2倍改善源于林。而且，林的优点也可以在处理器中被利用。嵌入在FPGA中实施的轧杠嵌入的GPU设计呈现NN训练工作负载，提高了1.32倍的功率效率提高，并且随着LAM + BWS的改进达到1.54倍。最后，评估了深入NN的LAM的训练。最多4个隐藏的层NN，基于LAM的培训，即使是具有侵略性的BWS，也可以实现高度相当的准确性，即使是准确的乘数。

著录项

来源
《Integration》 |2020年第9期|19-31|共13页
作者
Cheng TaiYu; Masuda Yukata; Chen Jun; Yu Jaehoon; Hashimoto Masanori;
展开▼
作者单位

Osaka Univ Dept Informat Syst Engn Suita Osaka Japan;

Nagoya Univ Grad Sch Informat Ctr Embedded Comp Syst Nagoya Aichi Japan;

Osaka Univ Dept Informat Syst Engn Suita Osaka Japan;

Tokyo Inst Technol Inst Innovat Res Tokyo Japan;

Osaka Univ Dept Informat Syst Engn Suita Osaka Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Approximate computing; Neural network; Training engine; Floating-point unit; Logarithm multiplier; GPU design;

机译：近似计算;神经网络;训练引擎;浮点单元;对数乘法器;GPU设计;

相似文献

外文文献
中文文献
专利

1. Applicability of approximate multipliers in hardware neural networks [J] . Uros Lotric, Patricio Bulic Neurocomputing . 2012,第期

机译：近似乘子在硬件神经网络中的适用性
2. New Flexible Multiple-Precision Multiply-Accumulate Unit for Deep Neural Network Training and Inference [J] . Giles Christopher E., Peterson Christina L., Heinrich Mark A. IEEE Transactions on Computers . 2020,第1期

机译：用于深度神经网络训练和推理的新灵活多精度乘积单元
3. Power-efficient combinatorial optimization using intrinsic noise in memristor Hopfield neural networks [J] . Fuxi Cai, Suhas Kumar, Thomas Van Vaerenbergh, Nature Electronics . 2020,第7期

机译：忆阻座Hopfield神经网络中的内在噪声的高效组合优化
4. Minimizing Power for Neural Network Training with Logarithm-Approximate Floating-Point Multiplier [C] . TaiYu Cheng, Jaehoon Yu, Masanori Hashimoto International Symposium on Power and Timing Modeling, Optimization and Simulation . 2019

机译：使用对数近似浮点乘法器将神经网络训练的功耗降至最低
5. Caffeinated FPGAs: FPGA Framework for Training and Inference of Convolutional Neural Networks With Reduced Precision Floating-Point Arithmetic [D] . DiCecco, Roberto. 2018

机译：含咖啡因的FPGA：用于训练和推理卷积神经网络的FPGA框架，具有降低的精度浮点算法
6. Training Deep Spiking Convolutional Neural Networks With STDP-Based Unsupervised Pre-training Followed by Supervised Fine-Tuning [O] . Chankyu Lee, Priyadarshini Panda, Gopalakrishnan Srinivasan, 2018

机译：通过基于STDP的无监督预训练和有监督的微调来训练深度尖峰卷积神经网络
7. A Logarithmic Floating-Point Multiplier for the Efficient Training of Neural Networks [O] . Zijing Niu, Honglan Jiang, Mohammad Saeed Ansari, 2021

机译：用于高效培训神经网络的对数浮点倍增器

Logarithm-approximate floating-point multiplier is applicable to power-efficient neural network training

摘要

著录项

相似文献

相关主题

期刊订阅