首页> 外文会议>IEEE International Symposium on Information Theory >Fitting ReLUs via SGD and Quantized SGD
【24h】

Fitting ReLUs via SGD and Quantized SGD

机译:通过SGD和量化SGD拟合ReLU

获取原文

摘要

In this paper we focus on the problem of finding the optimal weights of the shallowest of neural networks consisting of a single Rectified Linear Unit (ReLU). These functions are of the form x → max(0, 〈w, x〉) with w ∈ ℝd denoting the weight vector. We focus on a planted i model where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to a planted weight vector. We first show that mini-batch stochastic gradient descent when suitably initialized, converges at a geometric rate to the planted model with a number of samples that is optimal up to numerical constants. Next we focus on a parallel implementation where in each iteration the mini-batch gradient is calculated in a distributed manner across multiple processors and then broadcast to a master or all other processors. To reduce the communication cost in this setting we utilize a Quanitzed Stochastic Gradient Scheme (QSGD) where the partial gradients are quantized. Perhaps unexpectedly, we show that QSGD maintains the fast convergence of SGD to a globally optimal model while significantly reducing the communication cost. We further corroborate our numerical findings via various experiments including distributed implementations over Amazon EC2.
机译:在本文中,我们着重于寻找由单个整流线性单位(ReLU)组成的最浅层神经网络的最佳权重的问题。这些函数的形式为x→max(0,〈w,x〉),w∈ℝ d 表示权重向量。我们专注于选择i.i.d输入的种植i模型。从高斯分布中提取标记,并根据种植的权重矢量生成标记。我们首先显示,当适当初始化时,小批量随机梯度下降会以几何速率收敛到具有最高达数值常数的多个样本的种植模型。接下来,我们关注并行实现,其中在每次迭代中,以最小化的方式在多个处理器之间计算最小批量梯度,然后将其广播到主处理器或所有其他处理器。为了在这种情况下降低通信成本,我们使用了量化随机梯度方案(QSGD),其中对部分梯度进行了量化。也许出乎意料的是,我们表明QSGD保持了SGD快速收敛到全局最优模型,同时显着降低了通信成本。我们通过各种实验进一步证实了我们的数值发现,包括通过Amazon EC2进行分布式实施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号