Optimizing deep learning inference on mobile devices with neural network accelerators

曾惜; Xu Yunlong; Zhi Tian

首页> 中文期刊> 《高技术通讯：英文版》 >Optimizing deep learning inference on mobile devices with neural network accelerators

Optimizing deep learning inference on mobile devices with neural network accelerators

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

Deep learning has now been widely used in intelligent apps of mobile devices.In pursuit of ultra-low power and latency,integrating neural network accelerators(NNA)to mobile phones has become a trend.However,conventional deep learning programming frameworks are not well-developed to support such devices,leading to low computing efficiency and high memory-occupation.To address this problem,a 2-stage pipeline is proposed for optimizing deep learning model inference on mobile devices with NNAs in terms of both speed and memory-footprint.The 1 st stage reduces computation workload via graph optimization,including splitting and merging nodes.The 2 nd stage goes further by optimizing at compilation level,including kernel fusion and in-advance compilation.The proposed optimizations on a commercial mobile phone with an NNA is evaluated.The experimental results show that the proposed approaches achieve 2.8×to 26×speed up,and reduce the memory-footprint by up to 75%.

著录项

来源
《高技术通讯：英文版》 |2019年第4期|417-425|共9页
作者
曾惜; Xu Yunlong; Zhi Tian;
展开▼
作者单位

Intelligent Processor Research Center;

Institute of Computing Technology;

Chinese Academy of Sciences;

Beijing 100190;

P.R.China;

University of Chinese Academy of Sciences;

Beijing 100049;

P.R.China;

Cambricon Technologies Corporation Limited;

Beijing 100191;

P.R.China;

展开▼
原文格式 PDF
正文语种 chi
中图分类 TN9;
关键词
machine; learning; inference; neural; network; accelerator(NNA); low; latency; kernel; fusion; in-advance; compilation;

Optimizing deep learning inference on mobile devices with neural network accelerators

摘要

著录项

相关主题

期刊订阅