首页> 外文会议>International Conference on Artificial Neural Networks >Non-convergence and Limit Cycles in the Adam Optimizer
【24h】

Non-convergence and Limit Cycles in the Adam Optimizer

机译:Adam优化器中的非收敛和极限环

获取原文

摘要

One of the most popular training algorithms for deep neural networks is the Adaptive Moment Estimation (Adam) introduced by Kingma and Ba. Despite its success in many applications there is no satisfactory convergence analysis: only local convergence can be shown for batch mode under some restrictions on the hyperparameters, counterexamples exist for incremental mode. Recent results show that for simple quadratic objective functions limit cycles of period 2 exist in batch mode, but only for atypical hyperparameters, and only for the algorithm without bias correction. We extend the convergence analysis to all choices of the hyperparameters for quadratic functions. This finally answers the question of convergence for Adam in batch mode to the negative. We analyze the stability of these limit cycles and relate our analysis to other results where approximate convergence was shown, but under the additional assumption of bounded gradients which does not apply to quadratic functions. The investigation heavily relies on the use of computer algebra due to the complexity of the equations.
机译:深度神经网络最流行的训练算法之一是Kingma和Ba提出的自适应矩估计(Adam)。尽管它在许多应用程序中都取得了成功,但没有令人满意的收敛分析:在对超参数有一定限制的情况下,对于批处理模式只能显示局部收敛,而对于增量模式则存在反例。最近的结果表明,对于简单的二次目标函数,周期2的极限环以批处理模式存在,但仅对于非典型超参数,并且仅对于没有偏差校正的算法。我们将收敛性分析扩展到用于二次函数的超参数的所有选择。这最终将亚当在批处理模式下的收敛性问题解答为否定的。我们分析了这些极限环的稳定性,并将我们的分析与显示近似收敛的其他结果相关联,但是在有界梯度的附加假设下,这不适用于二次函数。由于方程的复杂性,研究严重依赖于计算机代数的使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号