Cross Hardware-Software Boundary Exploration for Scalable and Optimized Deep Learning Platform Design

Baozi Chen; Lei Wang; Qingbo Wu; Yusong Tan; Peng Zou

首页> 外文期刊>Embedded Systems Letters, IEEE >Cross Hardware-Software Boundary Exploration for Scalable and Optimized Deep Learning Platform Design

【24h】

Cross Hardware-Software Boundary Exploration for Scalable and Optimized Deep Learning Platform Design

机译：可扩展和优化的深度学习平台设计的跨软硬件边界探索

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep learning system composed with multiple levels of layers is increasingly presented in diverse areas nowadays. To achieve good performance, multicore CPUs and accelerators are widely used in real system. Previous study shows that GPU can significantly speed up computation in deep neural networks, while the performance does not scale very well on multicore CPUs. In this letter, we run Caffe on various hardware platforms using different computation setups to train LeNet-5 on MNIST dataset and measure individual time durations of forward and backward passes for each layer. We find that the speedups perform diversely and the scalability of multicore CPU varies when processing different stages of the network. Based on the observation, we show it is worth applying different policies for each layer separately to achieve the overall optimized performance. In addition, our benchmarking results can be used for references to develop dedicated acceleration methods for individual layer of the network.

机译：如今，在多层区域中越来越多地出现了由多层构成的深度学习系统。为了获得良好的性能，在实际系统中广泛使用了多核CPU和加速器。先前的研究表明，GPU可以显着加快深度神经网络中的计算速度，而在多核CPU上的性能却无法很好地扩展。在这封信中，我们使用不同的计算设置在各种硬件平台上运行Caffe，以在MNIST数据集上训练LeNet-5，并测量每层前进和后退的各个持续时间。我们发现，在处理网络的不同阶段时，加速性能各不相同，多核CPU的可扩展性也有所不同。根据观察结果，我们表明有必要分别对每一层应用不同的策略以实现整体优化性能。此外，我们的基准测试结果可作为参考，为网络的各个层开发专用的加速方法。

著录项

来源
《Embedded Systems Letters, IEEE》 |2018年第4期|107-110|共4页
作者
Baozi Chen; Lei Wang; Qingbo Wu; Yusong Tan; Peng Zou;
展开▼
作者单位

College of Computer Science, National University of Defense Technology, Changsha, China;

College of Computer Science, National University of Defense Technology, Changsha, China;

College of Computer Science, National University of Defense Technology, Changsha, China;

College of Computer Science, National University of Defense Technology, Changsha, China;

Science and Technology on Complex Electronic System Simulation Laboratory, Equipment Academy, Beijing, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Graphics processing units; Multicore processing; Machine learning; Artificial neural networks; Embedded computing; Multithreading; Scalability;

机译：图形处理单元;多核处理;机器学习;人工神经网络;嵌入式计算;多线程;可扩展性;

相似文献

外文文献
中文文献
专利

1. Hardware-software platform for integrated circuit technology learning and design via Internet [J] . V.V. Nelayev, M. Najbuk, T. Breczko Journal of Automation, Mobile Robotics & Intelligent Systems . 2011,第4期

机译：通过Internet学习和设计集成电路技术的软硬件平台
2. Proficient Design Space Exploration of ZYNQ SoC using VIVADO Design Suite: Custom Design of High Performance AXI Interface for High speed data transfer between PL and DDR Memory using Hardware-Software Co-Design [J] . Rikin J. Nayak, Jaiminkumar B. Chavda International Journal of Applied Engineering Research . 2018,第11aPta2期

机译：使用Vivado设计套件熟练设计空间探索Zynq SoC：高性能AXI接口的定制设计，用于使用硬件 - 软件共同设计的PL和DDR内存高速数据传输
3. SuperSlash: A Unified Design Space Exploration and Model Compression Methodology for Design of Deep Learning Accelerators With Reduced Off-Chip Memory Access Volume [J] . Ahmad Hazoor, Arif Tabasher, Hanif Muhammad Abdullah, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第11期

机译：Superslash：统一的设计空间探索和模型压缩方法，用于设计减少的芯片内存访问卷的深度学习加速器
4. The Design and Implementation of a Scalable Deep Learning Benchmarking Platform [C] . Cheng Li, Abdul Dakkak, Jinjun Xiong, IEEE International Conference on Cloud Computing . 2020

机译：可扩展深层学习基准平台的设计与实现
5. Statistical machine learning based modeling framework for design space exploration and run-time cross-stack energy optimization for many-core processors. [D] . Zhang, Changshu. 2013

机译：基于统计机器学习的建模框架，用于多核处理器的设计空间探索和运行时跨栈能量优化。
6. Design Optimization of a Pneumatic Soft Robotic Actuator Using Model-Based Optimization and Deep Reinforcement Learning [O] . Mahsa Raeisinezhad, Nicholas Pagliocca, Behrad Koohbor, 2021

机译：基于模型的优化和深度加固学习的气动软机器人执行器设计优化
7. The Design and Implementation of a Scalable Deep Learning Benchmarking Platform [O] . Cheng Li, Abdul Dakkak, Jinjun Xiong, 2020

机译：可扩展深层学习基准平台的设计与实现

Cross Hardware-Software Boundary Exploration for Scalable and Optimized Deep Learning Platform Design

摘要

著录项

相似文献

相关主题

期刊订阅