Fitting ReLUs via SGD and Quantized SGD

机译：通过SGD和量化SGD拟合ReLU

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we focus on the problem of finding the optimal weights of the shallowest of neural networks consisting of a single Rectified Linear Unit (ReLU). These functions are of the form x → max(0, 〈w, x〉) with w ∈ ℝ^d denoting the weight vector. We focus on a planted i model where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to a planted weight vector. We first show that mini-batch stochastic gradient descent when suitably initialized, converges at a geometric rate to the planted model with a number of samples that is optimal up to numerical constants. Next we focus on a parallel implementation where in each iteration the mini-batch gradient is calculated in a distributed manner across multiple processors and then broadcast to a master or all other processors. To reduce the communication cost in this setting we utilize a Quanitzed Stochastic Gradient Scheme (QSGD) where the partial gradients are quantized. Perhaps unexpectedly, we show that QSGD maintains the fast convergence of SGD to a globally optimal model while significantly reducing the communication cost. We further corroborate our numerical findings via various experiments including distributed implementations over Amazon EC2.

机译：在本文中，我们着重于寻找由单个整流线性单位（ReLU）组成的最浅层神经网络的最佳权重的问题。这些函数的形式为x→max（0，〈w，x〉），w∈ℝ ^{d
表示权重向量。我们专注于选择i.i.d输入的种植i模型。从高斯分布中提取标记，并根据种植的权重矢量生成标记。我们首先显示，当适当初始化时，小批量随机梯度下降会以几何速率收敛到具有最高达数值常数的多个样本的种植模型。接下来，我们关注并行实现，其中在每次迭代中，以最小化的方式在多个处理器之间计算最小批量梯度，然后将其广播到主处理器或所有其他处理器。为了在这种情况下降低通信成本，我们使用了量化随机梯度方案（QSGD），其中对部分梯度进行了量化。也许出乎意料的是，我们表明QSGD保持了SGD快速收敛到全局最优模型，同时显着降低了通信成本。我们通过各种实验进一步证实了我们的数值发现，包括通过Amazon EC2进行分布式实施。}

著录项

来源
《IEEE International Symposium on Information Theory》|2019年|2469-2473|共5页
会议地点
作者
Seyed Mohammadreza Mousavi Kalan; Mahdi Soltanolkotabi; A. Salman Avestimehr;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Convergence; Program processors; Stochastic processes; Quantization (signal); Neural networks; Training; Fitting;

机译：收敛;程序处理器;随机过程;量化（信号）;神经网络;训练;拟合;

相似文献

外文文献
中文文献
专利

1. Quantification of submarine groundwater discharge(SGD)using radon,radium tracers and nutrient inputs in Punnakayal, south coast of India [J] . S.Selvam, P.Muthukumar, Sruthy Sajeev, 地学前缘(英文版) . 2021,第001期
2. D-(DP) 2 SGD: Decentralized Parallel SGD with Differential Privacy in Dynamic Networks [J] . Yuan Yuan, Zongrui Zou, Dong Li, Wireless communications & mobile computing . 2021,第a期

机译：D-（DP）2 SGD：在动态网络中具有差异隐私的分散平行的SGD
3. WP-SGD: Weighted parallel SGD for distributed unbalanced-workload training system [J] . Cheng Daning, Li Shigang, Zhang Yunquan Journal of Parallel and Distributed Computing . 2020,第Nova期

机译：WP-SGD：分布式不平衡工作负载训练系统的加权并行SGD
4. Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors [J] . Gintare Karolina Dziugaite, Daniel Roy JMLR: Workshop and Conference Proceedings . 2018,第1期

机译：Entropy-SGD优化了PAC-Bayes边界的先验：Entropy-SGD的推广性质和数据相关先验
5. Fitting ReLUs via SGD and Quantized SGD [C] . Seyed Mohammadreza Mousavi Kalan, Mahdi Soltanolkotabi, A. Salman Avestimehr IEEE International Symposium on Information Theory . 2019

机译：通过SGD和量化SGD拟合粘合
6. Qsparse-local-SGD: Communication Efficient Distributed SGD with Quantization, Sparsification, and Local Computations [D] . Basu, Debraj Debashish. 2019

机译：Qsparse-local-SGD：具有量化，稀疏化和本地计算的高效通信分布式SGD
7. Microbial community composition across a coastal hydrological system affected by submarine groundwater discharge (SGD) [O] . Dini Adyasari, Christiane Hassenrück, Daniel Montiel, 2020

机译：沿着潜艇地下水排放影响的沿海水文系统的微生物群落组成（SGD）
8. Fitting ReLUs via SGD and Quantized SGD [O] . Seyed Mohammadreza Mousavi Kalan, Mahdi Soltanolkotabi, A. Salman Avestimehr 2019

机译：通过SGD和量化SGD拟合粘合

Fitting ReLUs via SGD and Quantized SGD

摘要

著录项

相似文献

相关主题

期刊订阅