首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism
【24h】

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism

机译:深度学习强度缩放的案例:用混合并行性训练大3D CNN

获取原文
获取原文并翻译 | 示例
           

摘要

We present scalable hybrid-parallel algorithms for training large-scale 3D convolutional neural networks. Deep learning-based emerging scientific workflows often require model training with large, high-dimensional samples, which can make training much more costly and even infeasible due to excessive memory usage. We solve these challenges by extensively applying hybrid parallelism throughout the end-to-end training pipeline, including both computations and I/O. Our hybrid-parallel algorithm extends the standard data parallelism with spatial parallelism, which partitions a single sample in the spatial domain, realizing strong scaling beyond the mini-batch dimension with a larger aggregated memory capacity. We evaluate our proposed training algorithms with two challenging 3D CNNs, CosmoFlow and 3D U-Net. Our comprehensive performance studies show that good weak and strong scaling can be achieved for both networks using up to 2K GPUs. More importantly, we enable training of CosmoFlow with much larger samples than previously possible, realizing an order-of-magnitude improvement in prediction accuracy.
机译:我们为培训大规模3D卷积神经网络呈现可扩展的混合并行算法。基于深入的学习的新兴科学工作流程通常需要具有大型高维样本的模型训练,这可能会导致培训更昂贵,由于过度的内存使用率,甚至更为不可行。我们通过在整个端到端培训管道中广泛应用混合并行性,包括计算和I / O来解决这些挑战。我们的混合并行算法将标准数据并行性扩展了空间并行性,其在空间域中分区单个样本,实现超出迷你批量维度的强大缩放,具有较大的聚合存储器容量。我们评估了我们提出的培训算法,具有两个具有挑战性的3D CNNS,Cosmoflow和3D U-Net。我们的综合性能研究表明,两家网络都可以实现良好的弱和强大的缩放,该网络使用多达2K GPU。更重要的是,我们可以使Cosmoflow的训练具有比以前可能的更大的样本,从而实现预测精度的大小提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号