DLFusion: An Auto-Tuning Compiler for Layer Fusion on Deep Neural Network Accelerator

机译：DLFusion：深神经网络加速器上的层融合器的自动调整编译器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many hardware vendors have introduced specialized deep neural networks (DNN) accelerators owing to their superior performance and efficiency. As such, how to generate and optimize the code for the hardware accelerator becomes an important yet less explored problem. In this paper, we perform the compiler-stage optimization study using a novel and representative Cambricon DNN accelerator and demonstrate that the code optimization knobs play an important role in unleashing the potential of hardware computational horsepower. However, even only two studied code optimization knobs, namely the number of cores and layer fusion scheme, present an enormous search space that prevents the naive brute-force search. This work introduces a joint, auto-tuning optimization framework to address this challenge. We first use a set of synthesized DNN layers to study the interplay between the hardware performance and layer characteristics. Based on the insights, we extract the operation count and feature map channel size as each layer's characteristics and derive a joint optimization strategy to decide the performance-optimal core number and fusion scheme. We evaluate the performance of the proposed approach using a set of representative DNN models and show that it achieves the minimal of 3.6x and the maximal of 7.9x performance speedup compared to no optimization baseline. We also show that the achieved speedup is close to the oracle case that is based on a reduced brute-force search but with much less search time.

机译：由于其卓越的性能和效率，许多硬件供应商介绍了专门的深度神经网络（DNN）加速器。因此，如何生成和优化硬件加速器代码成为一个重要的较少探索的问题。在本文中，我们使用小说和代表性的Cambricon DNN加速器进行编译器 - 级优化研究，并证明代码优化旋钮在释放硬件计算马力的潜力方面发挥着重要作用。然而，即使只有两个研究的代码优化旋钮，即核心和层融合方案的数量，呈现出巨大的搜索空间，这可以防止幼稚的强力搜索。这项工作介绍了一个联合，自动调整优化框架来解决这一挑战。我们首先使用一组合成的DNN层来研究硬件性能和层特征之间的相互作用。基于洞察力，我们将操作计数和特征映射频道大小提取为每个层的特征，并导出联合优化策略来决定性能最优核心号和融合方案。我们使用一组代表性DNN模型评估所提出的方法的性能，并表明它与没有优化基线相比，它实现了3.6倍的最小3.6倍，最大值为7.9x性能加速。我们还表明，实现的加速度靠近基于减少的蛮力搜索的Oracle案例，但是搜索时间更少。

著录项

来源
《International Conference on Big Data and Cloud Computing;IEEE International Symposium on Parallel and Distributed Processing with Applications;International Symposium on Social Computing and Networking;International Conference on Sustainable Computing and Communications》|2020年|118-127|共10页
会议地点
作者
Zihan Liu; Jingwen Leng; Quan Chen; Chao Li; Wenli Zheng; Li Li; Minyi Guo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Fuses; Computational modeling; Neural networks; C++ languages; Feature extraction; Hardware; Generators;

机译：保险丝;计算建模;神经网络;C ++语言;特征提取;硬件;发电机;

相似文献

外文文献
中文文献
专利

1. Deep neural networks compiler for a trace-based accelerator [J] . Chang Andre Xian Ming, Zaidy Aliasger, Vitez Marko, Journal of systems architecture . 2020,第期

机译：基于追踪的加速器的深神经网络编译器
2. Deep Neural Networks Compiler for a Trace-Based Accelerator (Short WIP Paper) [J] . Andre Xian Ming Chang, Aliasger Zaidy, Lukasz Burzawa, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2018,第6期

机译：深度神经网络编译器，用于基于轨迹的加速器（短壁纸）
3. An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators [J] . Nabavinejad Seyed Morteza, Baharloo Mohammad, Chen Kun-Chih, Emerging and Selected Topics in Circuits and Systems, IEEE Journal on . 2020,第3期

机译：深度神经网络加速器有效互连网络概述
4. A Compiler for Deep Neural Network Accelerators to Generate Optimized Code for a Wide Range of Data Parameters from a Hand-crafted Computation Kernel [C] . Eri Ogawa, Kazuaki Ishizaki, Hiroshi Inoue, IEEE Symposium in Low-Power and High-Speed Chips . 2019

机译：用于深度神经网络加速器的编译器，可从手工计算内核生成针对广泛数据参数的优化代码
5. Reducing Off-chip Memory Accesses in Deep Neural Network Accelerators [D] . Siu, Kevin. 2019

机译：减少深度神经网络加速器中的片外存储器访问
6. Neural Circuits: Spontaneous dynamics of neural networks in deep layers of prefrontal cortex [O] . Andrew S. Blaeser, Barry W. Connors, Arto V. Nurmikko -1

机译：神经回路：前额叶皮层深层神经网络的自发动力学
7. DNNZip: Selective Layers Compression Technique in Deep Neural Network Accelerators [O] . Habiba Lahdhiri, Maurizio Palesi, Salvatore Monteleone, 2020

机译：DNNZIP：深神经网络加速器中的选择性层压缩技术

DLFusion: An Auto-Tuning Compiler for Layer Fusion on Deep Neural Network Accelerator

摘要

著录项

相似文献

相关主题

期刊订阅