Workload-aware Automatic Parallelization for Multi-GPU DNN Training

机译：用于多GPU DNN培训的可感知工作负载的自动并行化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep neural networks (DNNs) have emerged as successful solutions for variety of artificial intelligence applications, but their very large and deep models impose high computational requirements during training. Multi-GPU parallelization is a popular option to accelerate demanding computations in DNN training, but most state-of-the-art multi-GPU deep learning frameworks not only require users to have an in-depth understanding of the implementation of the frameworks themselves, but also apply parallelization in a straight-forward way without optimizing GPU utilization. In this work, we propose a workload-aware auto-parallelization framework (WAP) for DNN training, where the work is automatically distributed to multiple GPUs based on the workload characteristics. We evaluate WAP using TensorFlow with popular DNN benchmarks (AlexNet and VGG-16), and show competitive training throughput compared with the state-of-the-art frameworks, and also demonstrate that WAP automatically optimizes GPU assignment based on the workload's compute requirements, thereby improving energy efficiency.

机译：深度神经网络（DNN）已成为各种人工智能应用的成功解决方案，但它们的超大型模型和深度模型在训练过程中对计算量提出了很高的要求。多GPU并行化是在DNN培训中加速要求苛刻的计算的一种流行选择，但是大多数最新的多GPU深度学习框架不仅要求用户对框架本身的实现有深入的了解，而且还可以在不优化GPU利用率的情况下直接应用并行化。在这项工作中，我们提出了一种用于DNN培训的工作负载感知自动并行化框架（WAP），其中该工作会根据工作负载特征自动分配到多个GPU。我们使用TensorFlow和流行的DNN基准（AlexNet和VGG-16）对WAP进行评估，并与最先进的框架进行比较，展示出具有竞争力的培训吞吐量，并证明WAP根据工作量的计算要求自动优化GPU分配，从而提高能源效率。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|1453-1457|共5页
会议地点
作者
Sungho Shin; Youngmin Jo; Jungwook Choi; Swagath Venkataramani; Vijayalakshmi Srinivasan; Wonyong Sung;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
graphics processing units; learning (artificial intelligence); neural nets; parallel architectures;

机译：图形处理单元;学习（人工智能）;神经网络;并行架构;

相似文献

外文文献
中文文献
专利

1. Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training [J] . Pal Saptadeep, Ebrahimi Eiman, Zulfiqar Arslan, IEEE Micro . 2019,第5期

机译：优化深度学习训练的多GPU并行化策略
2. Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training [J] . Pal Saptadeep, Ebrahimi Eiman, Zulfiqar Arslan, IEEE Micro . 2019,第5期

机译：优化深度学习培训的多GPU并行化策略
3. EC-DNN: A new method for parallel training of deep neural networks [J] . Sun Shizhao, Liu Xiaoguang Neurocomputing . 2018,第APRa26期

机译：EC-DNN：一种用于深度神经网络并行训练的新方法
4. Workload-aware Automatic Parallelization for Multi-GPU DNN Training [C] . Sungho Shin, Youngmin Jo, Jungwook Choi, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：用于多GPU DNN培训的工作负载感知自动并行化
5. Co-Designing Communication Middleware and Deep Learning Frameworks for High-Performance Dnn Training on Hpc Systems [D] . Awan, Ammar Ahmad. 2020

机译：共同设计通信中间件和HPC系统高性能DNN培训的深度学习框架
6. Multi-GPU Based Parallel Design of the Ant Colony Optimization Algorithm for Endmember Extraction from Hyperspectral Images [O] . Jianwei Gao, Yi Sun, Bing Zhang, 2019

机译：基于多GPU的蚁群优化算法从高光谱图像中提取末端成员的并行设计
7. Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training [O] . Saptadeep Pal, Eiman Ebrahimi, Arslan Zulfiqar, 2019

机译：优化深度学习培训的多GPU并行化策略

Workload-aware Automatic Parallelization for Multi-GPU DNN Training

摘要

著录项

相似文献

相关主题

期刊订阅