首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Error-Compensated Sparsification for Communication-Efficient Decentralized Training in Edge Environment
【24h】

Error-Compensated Sparsification for Communication-Efficient Decentralized Training in Edge Environment

机译:用于边缘环境中通信有效的分散训练的错误补偿稀疏性

获取原文
获取原文并翻译 | 示例
           

摘要

Communication has been considered as a major bottleneck in large-scale decentralized training systems since participating nodes iteratively exchange large amounts of intermediate data with their neighbors. Although compression techniques like sparsification can significantly reduce the communication overhead in each iteration, errors caused by compression will be accumulated, resulting in a severely degraded convergence rate. Recently, the error compensation method for sparsification has been proposed in centralized training to tolerate the accumulated compression errors. However, the analog technique and the corresponding theory about its convergence in decentralized training are still unknown. To fill in the gap, we design a method named ECSD-SGD that significantly accelerates decentralized training via error-compensated sparsification. The novelty lies in that we identify the component of the exchanging information in each iteration (i.e., the sparsified model update) and make targeted error compensation over the component. Our thorough theoretical analysis shows that ECSD-SGD supports arbitrary sparsification ratio and achieves the same convergence rate as the non-sparsified decentralized training methods. We also conduct extensive experiments on multiple deep learning models to validate our theoretical findings. Results show that ECSD-SGD outperforms all the start-of-the-art sparsified methods in terms of both the convergence speed and the final generalization accuracy.
机译:通信已被视为大规模分散培训系统中的主要瓶颈,因为参与节点迭代地与其邻居交换大量中间数据。尽管像稀疏这样的压缩技术可以显着降低每次迭代中的通信开销,但是将累积压缩引起的误差,导致收敛速度严重降低。最近,在集中训练中提出了对稀疏的误差补偿方法,以容忍累积的压缩误差。然而,模拟技术和关于其在分散训练中的收敛的相应理论仍然未知。要填补差距,我们设计了一个名为ECSD-SGD的方法,通过错误补偿的稀疏性显着加速分散的训练。新颖性在于,我们在每个迭代(即,Sparsified Model Update)中识别交换信息的组件,并在组件上进行有针对性的误差补偿。我们彻底的理论分析表明,ECSD-SGD支持任意稀释率,并以非稀疏的分散培训方法实现相同的收敛速度。我们还对多个深度学习模型进行了广泛的实验,以验证我们的理论调查结果。结果表明,ECSD-SGD在收敛速度和最终泛化准确性方面优于所有最初的稀疏方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号