首页> 外文期刊>Embedded Systems Letters, IEEE >Cross Hardware-Software Boundary Exploration for Scalable and Optimized Deep Learning Platform Design
【24h】

Cross Hardware-Software Boundary Exploration for Scalable and Optimized Deep Learning Platform Design

机译:可扩展和优化的深度学习平台设计的跨软硬件边界探索

获取原文
获取原文并翻译 | 示例
           

摘要

Deep learning system composed with multiple levels of layers is increasingly presented in diverse areas nowadays. To achieve good performance, multicore CPUs and accelerators are widely used in real system. Previous study shows that GPU can significantly speed up computation in deep neural networks, while the performance does not scale very well on multicore CPUs. In this letter, we run Caffe on various hardware platforms using different computation setups to train LeNet-5 on MNIST dataset and measure individual time durations of forward and backward passes for each layer. We find that the speedups perform diversely and the scalability of multicore CPU varies when processing different stages of the network. Based on the observation, we show it is worth applying different policies for each layer separately to achieve the overall optimized performance. In addition, our benchmarking results can be used for references to develop dedicated acceleration methods for individual layer of the network.
机译:如今,在多层区域中越来越多地出现了由多层构成的深度学习系统。为了获得良好的性能,在实际系统中广泛使用了多核CPU和加速器。先前的研究表明,GPU可以显着加快深度神经网络中的计算速度,而在多核CPU上的性能却无法很好地扩展。在这封信中,我们使用不同的计算设置在各种硬件平台上运行Caffe,以在MNIST数据集上训练LeNet-5,并测量每层前进和后退的各个持续时间。我们发现,在处理网络的不同阶段时,加速性能各不相同,多核CPU的可扩展性也有所不同。根据观察结果,我们表明有必要分别对每一层应用不同的策略以实现整体优化性能。此外,我们的基准测试结果可作为参考,为网络的各个层开发专用的加速方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号