The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism

Oyama Yosuke; Maruyama Naoya; Dryden Nikoli; Mccarthy Erin; Harrington Peter; Balewski Jan; Matsuoka Satoshi; Nugent Peter; Van Essen Brian

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism

【24h】

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism

机译：深度学习强度缩放的案例：用混合并行性训练大3D CNN

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present scalable hybrid-parallel algorithms for training large-scale 3D convolutional neural networks. Deep learning-based emerging scientific workflows often require model training with large, high-dimensional samples, which can make training much more costly and even infeasible due to excessive memory usage. We solve these challenges by extensively applying hybrid parallelism throughout the end-to-end training pipeline, including both computations and I/O. Our hybrid-parallel algorithm extends the standard data parallelism with spatial parallelism, which partitions a single sample in the spatial domain, realizing strong scaling beyond the mini-batch dimension with a larger aggregated memory capacity. We evaluate our proposed training algorithms with two challenging 3D CNNs, CosmoFlow and 3D U-Net. Our comprehensive performance studies show that good weak and strong scaling can be achieved for both networks using up to 2K GPUs. More importantly, we enable training of CosmoFlow with much larger samples than previously possible, realizing an order-of-magnitude improvement in prediction accuracy.

机译：我们为培训大规模3D卷积神经网络呈现可扩展的混合并行算法。基于深入的学习的新兴科学工作流程通常需要具有大型高维样本的模型训练，这可能会导致培训更昂贵，由于过度的内存使用率，甚至更为不可行。我们通过在整个端到端培训管道中广泛应用混合并行性，包括计算和I / O来解决这些挑战。我们的混合并行算法将标准数据并行性扩展了空间并行性，其在空间域中分区单个样本，实现超出迷你批量维度的强大缩放，具有较大的聚合存储器容量。我们评估了我们提出的培训算法，具有两个具有挑战性的3D CNNS，Cosmoflow和3D U-Net。我们的综合性能研究表明，两家网络都可以实现良好的弱和强大的缩放，该网络使用多达2K GPU。更重要的是，我们可以使Cosmoflow的训练具有比以前可能的更大的样本，从而实现预测精度的大小提高。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2021年第7期|1641-1652|共12页
作者
Oyama Yosuke; Maruyama Naoya; Dryden Nikoli; Mccarthy Erin; Harrington Peter; Balewski Jan; Matsuoka Satoshi; Nugent Peter; Van Essen Brian;
展开▼
作者单位

Tokyo Inst Technol Tokyo 1528550 Japan|Lawrence Livermore Natl Lab Livermore CA 94551 USA;

Lawrence Livermore Natl Lab Livermore CA 94551 USA;

Lawrence Livermore Natl Lab Livermore CA 94551 USA|Swiss Fed Inst Technol CH-8092 Zurich Switzerland;

Lawrence Livermore Natl Lab Livermore CA 94551 USA|Univ Oregon Eugene OR 97403 USA;

Lawrence Livermore Natl Lab Livermore CA 94551 USA;

Lawrence Livermore Natl Lab Livermore CA 94551 USA;

Tokyo Inst Technol Tokyo 1528550 Japan|RIKEN Ctr Computat Sci Kobe Hyogo 6500047 Japan;

Lawrence Livermore Natl Lab Livermore CA 94551 USA;

Lawrence Livermore Natl Lab Livermore CA 94551 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Training; Three-dimensional displays; Computational modeling; Parallel processing; Solid modeling; Memory management; Image segmentation; Deep learning; convolutional neural network; model-parallel training; hybrid-parallel training;

机译：培训;三维显示器;计算建模;并行处理;实体建模;内存管理;图像分割;深度学习;卷积神经网络;模型平行训练;混合并行训练;

相似文献

外文文献
中文文献
专利

1. Hybrid Electrical/Optical Switch Architectures for Training Distributed Deep Learning in Large-Scale [J] . Thao-Nguyen TRUONG, Ryousei TAKANO IEICE transactions on information and systems . 2021,第8期

机译：混合电气/光学交换机架构，用于培训大规模的分布式深度学习
2. Nemesyst: A hybrid parallelism deep learning-based framework applied for internet of things enabled food retailing refrigeration systems [J] . Onoufriou George, Bickerton Ronald, Pearson Simon, Computers in Industry . 2019,第期

机译：nemesyst：一种混合并行性深入学习的框架，适用于能够实现食物零售制冷系统的物联网
3. 3D CNN-PCA: A deep-learning-based parameterization for complex geomodels [J] . Liu Yimin, Durlofsky Louis J. Computers & geosciences . 2021,第Mara期

机译：3D CNN-PCA：基于深度学习的复杂地理典礼的参数化
4. Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism [C] . Nikoli Dryden, Naoya Maruyama, Tom Benson, IEEE International Parallel and Distributed Processing Symposium . 2019

机译：利用细粒度的并行性提高CNN培训的规模
5. H3DNET: A Deep Learning Framework for Hierarchical 3D Object Classification [D] . Patel, Marmikkumar. 2017

机译：H3DNET：用于分层3D对象分类的深度学习框架
6. Academic Emotion Classification and Recognition Method for Large-scale Online Learning Environment—Based on A-CNN and LSTM-ATT Deep Learning Pipeline Method [O] . Xiang Feng, Yaojia Wei, Xianglin Pan, 2020

机译：大规模在线学习环境的学术情感分类与识别方法-基于A-CNN和LSTM-ATT深度学习流水线方法
7. The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism [O] . Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, 2021

机译：深度学习强度缩放的案例：用混合并行性训练大3D CNN

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism

摘要

著录项

相似文献

相关主题

期刊订阅