首页> 外文会议>IEEE International Conference on Parallel and Distributed Systems >A Quick Survey on Large Scale Distributed Deep Learning Systems
【24h】

A Quick Survey on Large Scale Distributed Deep Learning Systems

机译:大规模分布式深度学习系统快速调查

获取原文

摘要

Deep learning have been widely used in various fields and has worked very well as a major role. While the gradual penetration into various fields, data quantity of each applications is increasing tremendously, and so as the computation complexity and model parameters. As an obvious result, the training and inference is time consuming. For example, a classic Resnet50 classification model will be trained in 14 days on a NVIDIA M40 GPU with ImageNet data set. Thus, distributed acceleration is a very useful way to dispatch the computation of training and even inference to scale of nodes in parallel and accelerate the whole process. Facebook's work and UC Berkeley's acceleration can training the Resnet-50 model within hour and minutes by distributed deep learning algorithm and system, representatively. As other distributed accelerations, it gives a possibility to accelerate large models on large data sets from weeks to minutes, which gives researchers and developers more space to explore and search. However, besides acceleration, what other issues will be confronted of the distributed deep learning system? Where is the upper limit of acceleration? What application will acceleration be used for? What is the price and cost of acceleration? In this paper, we will take a simple and quick survey on the distributed deep learning system from algorithm perspective, distributed system perspective and applications perspective. We will present several recent excellent works, and bring analysis on the restricts and prospects of the distributed methods.
机译:深度学习已广泛用于各种领域,并效果很好地作为一个重要作用。虽然逐渐渗透到各种领域,但每个应用程序的数据量都是巨大的巨大增加,因此计算复杂性和模型参数。作为一个明显的结果,训练和推理是耗时的。例如,经典Reset50分类模型将在NVIDIA M40 GPU上的14天内培训,具有想象成数据集。因此,分布式加速是分派对训练计算的非常有用的方式,并行地向节点的比例调度并加速整个过程。 Facebook的工作和UC Berkeley的加速可以通过分布的深度学习算法和系统,代表性地培训Reset-50型号。作为其他分布式加速度,它可以从数周到几周内加速大型数据集的大型模型,从而为研究人员和开发人员提供更多探索和搜索的空间。但是,除了加速,还有哪些其他问题将面对分布式深度学习系统?加速度的上限在哪里?什么应用程序将用于加速?加速的价格和成本是多少?本文从算法透视,分布式系统透视和应用程序角度来看,我们将对分布式深度学习系统进行简单快速的调查。我们将展示几个最近的优秀作品,并对分布式方法的限制和前景进行分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号