A Novel Stochastic Stratified Average Gradient Method: Convergence Rate and Its Complexity

机译：一种新型随机分层平均梯度法：收敛速率及其复杂性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

SGD (Stochastic Gradient Descent) is a popular algorithm for large scale optimization problems due to its low iterative cost. However, SGD can not achieve linear convergence rate as FGD (Full Gradient Descent) because of the inherent gradient variance. To attack the problem, mini-batch SGD was proposed to get a trade-off in terms of convergence rate and iteration cost. In this paper, a general CVI (ConvergenceVariance Inequality) equation is presented to state formally the interaction of convergence rate and gradient variance. Then a novel algorithm named SSAG (Stochastic Stratified Average Gradient) is introduced to reduce gradient variance based on two techniques, stratified sampling and averaging over iterations that is a key idea in SAG (Stochastic Average Gradient). Furthermore, SSAG can achieve linear convergence rate of O((1 - μ/8C L)^{k) at smaller storage and iterative costs, where C ≥ 2 is the category number of training data. This convergence rate depends mainly on the variance between classes, but not on the variance within the classes. In the case of C ? N (N is the training data size), SSAG's convergence rate is much better than SAG's convergence rate of O((1 - μ/8N L)^{k). Our experimental results show SSAG outperforms SAG and many other algorithms.}}

机译：SGD（随机梯度下降）是一种由于其低迭代成本而具有大规模优化问题的流行算法。然而，由于固有的梯度方差，SGD不能实现线性会聚速率（完全梯度下降）。为了攻击问题，提出了迷你批处理SGD，以便在收敛速度和迭代成本方面获得权衡。在本文中，呈现了一般的CVI（收敛variance）方程式以正式的趋同率和梯度方差的相互作用。然后引入名为SSAG（随机分层平均梯度）的新颖算法以基于两种技术，分层采样和在凹陷中的关键思想（随机平均梯度）上的迭代的平均来降低梯度方差。此外，SSAG可以实现O的线性收敛速率（（1 - μ/ 8cl）^{k ）以较小的存储和迭代成本，其中C≥2是培训数据的类别数量。这种收敛速度主要取决于类之间的差异，但不是类内的方差。在C的情况下？ n（n是培训数据大小），SSAG的收敛速度远比SAG的o（（1 - μ/ 8nL）的收敛速度好^{k ）。我们的实验结果表明SSAG优于SAG和许多其他算法。}}

著录项

来源
《International Joint Conference on Neural Networks》|2018年|1-705p|共8页
会议地点
作者
Aixiang Andy Chen; Xiaolong Chai; Bingchuan Chen; Rui Bian; Qingliang Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP183-53;
关键词
Convergence; Training data; Stochastic processes; Optimization; Neural networks; Training; Mathematical model;

机译：收敛;培训数据;随机过程;优化;神经网络;培训;数学模型;

相似文献

外文文献
中文文献
专利

1. On the rates of convergence of parallelized averaged stochastic gradient algorithms [J] . Godichon-Baggioni Antoine, Saadane Sofiane Statistics . 2020,第1a3期

机译：关于并行化平均随机梯度算法的收敛速度
2. Convergence Rates for the Stochastic Gradient Descent Method for Non-Convex Objective Functions [J] . Benjamin Fehrman, Benjamin Gess, Arnulf Jentzen Journal of machine learning research . 2020,第a期

机译：用于非凸面目标函数的随机梯度下降方法的收敛速率
3. Analyzing Convergence and Rates of Convergence of Particle Swarm Optimization Algorithms Using Stochastic Approximation Methods [J] . Yuan Quan, Yin George Automatic Control, IEEE Transactions on . 2015,第7期

机译：随机近似法分析粒子群优化算法的收敛性和收敛速度
4. A Novel Stochastic Stratified Average Gradient Method: Convergence Rate and Its Complexity [C] . Aixiang Andy Chen, Xiaolong Chai, Bingchuan Chen, International Joint Conference on Neural Networks . 2018

机译：一种新的随机分层平均梯度法：收敛速度及其复杂度
5. Averaging Projected Stochastic Gradient Descent for large scale least square problem. [D] . Mu, Yang. 2012

机译：对大型最小二乘问题平均投影随机梯度下降。
6. Pediatric Medical Complexity Algorithm: A New Method to Stratify Children by Medical Complexity [O] . Tamara D. Simon, Mary Lawrence Cawthon, Susan Stanford, -1

机译：儿科医学复杂性算法：一种通过医学复杂性对儿童进行分层的新方法
7. On the convergence properties of a $K$-step averaging stochastic gradient descent algorithm for nonconvex optimization [O] . Zhou, Fan, Cong, Guojing 2017

机译：关于$ K $ -step平均随机的收敛性非凸优化的梯度下降算法
8. I,II Convergence and Rate of Convergence Theorems for Constrained and Unconstrained Stochastic Approximation,via Weak Convergence Methods. III Numerical Studies for Constrained Stochastic Approximation Problems, [R] . kushner,harold j. lakshmivarahan, s. 1977

机译：I，II收敛性和受约束和无约束随机逼近的收敛速度定理，通过弱收敛方法。 III约束随机逼近问题的数值研究，

A Novel Stochastic Stratified Average Gradient Method: Convergence Rate and Its Complexity

摘要

著录项

相似文献

相关主题

期刊订阅