【24h】

DLFusion: An Auto-Tuning Compiler for Layer Fusion on Deep Neural Network Accelerator

机译:DLFusion:深神经网络加速器上的层融合器的自动调整编译器

获取原文

摘要

Many hardware vendors have introduced specialized deep neural networks (DNN) accelerators owing to their superior performance and efficiency. As such, how to generate and optimize the code for the hardware accelerator becomes an important yet less explored problem. In this paper, we perform the compiler-stage optimization study using a novel and representative Cambricon DNN accelerator and demonstrate that the code optimization knobs play an important role in unleashing the potential of hardware computational horsepower. However, even only two studied code optimization knobs, namely the number of cores and layer fusion scheme, present an enormous search space that prevents the naive brute-force search. This work introduces a joint, auto-tuning optimization framework to address this challenge. We first use a set of synthesized DNN layers to study the interplay between the hardware performance and layer characteristics. Based on the insights, we extract the operation count and feature map channel size as each layer's characteristics and derive a joint optimization strategy to decide the performance-optimal core number and fusion scheme. We evaluate the performance of the proposed approach using a set of representative DNN models and show that it achieves the minimal of 3.6x and the maximal of 7.9x performance speedup compared to no optimization baseline. We also show that the achieved speedup is close to the oracle case that is based on a reduced brute-force search but with much less search time.
机译:由于其卓越的性能和效率,许多硬件供应商介绍了专门的深度神经网络(DNN)加速器。因此,如何生成和优化硬件加速器代码成为一个重要的较少探索的问题。在本文中,我们使用小说和代表性的Cambricon DNN加速器进行编译器 - 级优化研究,并证明代码优化旋钮在释放硬件计算马力的潜力方面发挥着重要作用。然而,即使只有两个研究的代码优化旋钮,即核心和层融合方案的数量,呈现出巨大的搜索空间,这可以防止幼稚的强力搜索。这项工作介绍了一个联合,自动调整优化框架来解决这一挑战。我们首先使用一组合成的DNN层来研究硬件性能和层特征之间的相互作用。基于洞察力,我们将操作计数和特征映射频道大小提取为每个层的特征,并导出联合优化策略来决定性能最优核心号和融合方案。我们使用一组代表性DNN模型评估所提出的方法的性能,并表明它与没有优化基线相比,它实现了3.6倍的最小3.6倍,最大值为7.9x性能加速。我们还表明,实现的加速度靠近基于减少的蛮力搜索的Oracle案例,但是搜索时间更少。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号