Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs

Li Ang; Su Simon

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs

【24h】

Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs

机译：通过TITE GPU的比特 - 张量芯加速二值化神经网络

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Despite foreseeing tremendous speedups over conventional deep neural networks, the performance advantage of binarized neural networks (BNNs) has merely been showcased on general-purpose processors such as CPUs and GPUs. In fact, due to being unable to leverage bit-level-parallelism with a word-based architecture, GPUs have been criticized for extremely low utilization (1 percent) when executing BNNs. Consequently, the latest tensorcores in NVIDIA Turing GPUs start to experimentally support bit computation. In this article, we look into this brand new bit computation capability and characterize its unique features. We show that the stride of memory access can significantly affect performance delivery and a data-format co-design is highly desired to support the tensorcores for achieving superior performance than existing software solutions without tensorcores. We realize the tensorcore-accelerated BNN design, particularly the major functions for fully-connect and convolution layers - bit matrix multiplication and bit convolution. Evaluations on two NVIDIA Turing GPUs show that, with ResNet-18, our BTC-BNN design can process ImageNet at a rate of 5.6K images per second, 77 percent faster than state-of-the-art. Our BNN approach is released on https://github.com/pnnl/TCBNN.

机译：尽管对传统的深度神经网络进行了巨大的加速，但二值化神经网络（BNN）的性能优势仅在CPU和GPU等通用处理器上展示了展示。事实上，由于无法利用与基于词的架构的比特级并行性，在执行BNN时，GPU被批评为极低的利用率（1％）。因此，NVIDIA中的最新Tensorcores Ty Ty GPU开始通过实验支持位计算。在本文中，我们研究了这个全新的比特计算能力，并表征了其独特的功能。我们表明，存储器访问的步幅可以显着影响性能传递，并且非常需要数据格式共同设计来支持比现有的软件解决方案实现卓越的性能而没有Tensorcores。我们实现了Tensorcore加速的BNN设计，特别是全连接和卷积层的主要功能 - 比特矩阵乘法和比特卷积。对两种NVIDIA进行GPU的评估表明，随着Reset-18，我们的BTC-BNN设计可以以每秒5.6k图像的速度处理想象成，比最先进的77％更快。我们的BNN方法是在https://github.com/pnnl/tcbnn上发布的。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2021年第7期|1878-1891|共14页
作者
Li Ang; Su Simon;
展开▼
作者单位

Pacific Northwest Natl Lab PNNL High Performance Comp Grp Richland WA 99354 USA;

US Army Res Lab ARL DoD Supercomp Resource Ctr Aberdeen Proving Ground MD 21005 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Graphics processing units; Hardware; Convolution; Synchronization; Tensors; Libraries; Field programmable gate arrays;

机译：图形处理单元;硬件;卷积;同步;张量;图书馆;现场可编程门阵列;

相似文献

外文文献
中文文献
专利

1. CAN FPGAs BEAT GPUs IN ACCELERATING NEXT-GENERATION DEEP NEURAL NETWORKS? [J] . Scientific Computing World . 2019,第166期

机译：FPGA能否加速下一代深层神经网络中的GPU？
2. An efficient unconstrained facial expression recognition algorithm based on Stack Binarized Auto-encoders and Binarized Neural Networks [J] . Sun Wenyun, Zhao Haitao, Jin Zhong Neurocomputing . 2017,第deca6期

机译：基于堆栈二值化自动编码器和二值化神经网络的高效无约束表情识别算法
3. Accelerating Training of Deep Neural Networks on GPU using CUDA [J] . D.T.V. Dharmajee Rao, K.V. Ramana International Journal of Intelligent Systems and Applications . 2019,第5期

机译：使用CUDA加速GPU上的深度神经网络训练
4. Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC [C] . Eriko Nurvitadhi, David Sheffield, Jaewoong Sim, International Conference on Field-Programmable Technology . 2016

机译：加速二值化神经网络：FPGA，CPU，GPU和ASIC的比较
5. AutoVM: Accelerating Convolutional Neural Network Training with Actively Managed GPU Virtual Memory [D] . Chen, Luyuan . 2020

机译：Autovm：积极管理GPU虚拟内存加速卷积神经网络培训
6. Real-time retinal layer segmentation of OCT volumes with GPU accelerated inferencing using a compressed low-latency neural network [O] . Svetlana Borkovkina, Acner Camino, Worawee Janpongsri, 2020

机译：10月的Real-Time视网膜分段与GPU加速推断使用压缩低延迟神经网络的推动
7. Binarized Convolutional Neural Networks for Efficient Inference on GPUs [O] . Mir Khan, Heikki Huttunen, Jani Boutellier 2018

机译：二值化卷积神经网络，用于高效推断GPU

Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs

摘要

著录项

相似文献

相关主题

期刊订阅