首页> 外文OA文献 >Second-order Optimization for Non-convex Machine Learning: an Empirical Study
【2h】

Second-order Optimization for Non-convex Machine Learning: an Empirical Study

机译:非凸机学习二阶优化:实证研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The resurgence of deep learning, as a highly effective machine learningparadigm, has brought back to life the old optimization question ofnon-convexity. Indeed, the challenges related to the large-scale nature of manymodern machine learning applications are severely exacerbated by the inherentnon-convexity in the underlying models. In this light, efficient optimizationalgorithms which can be effectively applied to such large-scale and non-convexlearning problems are highly desired. In doing so, however, the bulk ofresearch has been almost completely restricted to the class of 1st-orderalgorithms. This is despite the fact that employing the curvature information,e.g., in the form of Hessian, can indeed help with obtaining effective methodswith desirable convergence properties for non-convex problems, e.g., avoidingsaddle-points and convergence to local minima. The conventional wisdom, in themachine learning community is that the application of 2nd-order methods, i.e.,those that employ Hessian as well as gradient information, can be highlyinefficient. Consequently, 1st-order algorithms, such as stochastic gradientdescent (SGD), have been at the center-stage for solving such machine learningproblems. Here, we aim at addressing this misconception by consideringefficient and stochastic variants of Newton's method, namely, sub-sampledtrust-region and cubic regularization, whose theoretical convergence propertieshave recently been established in [Xu 2017]. Using a variety of experiments, weempirically evaluate the performance of these methods for solving non-convexmachine learning applications. In doing so, we highlight the shortcomings of1st-order methods, e.g., high sensitivity to hyper-parameters such as step-sizeand undesirable behavior near saddle-points, and showcase the advantages ofemploying curvature information as effective remedy.
机译:深入学习的复苏,作为一种高效的机器学习评委,已经带回了人们吞噬的旧优化问题。实际上,与底层模型中的固有元凸起严重加剧了与许多熟悉机器学习应用的大规模性质有关的挑战。在这种光中,非常需要有效地应用于这种大规模和非凸性问题的有效优化识别。然而,在这样做时,散装搜索几乎完全限制在第一个Orderalgorithms的类别。尽管采用曲率信息,例如,以Hessian的形式采用曲率信息,但可以帮助获得用于非凸面问题的有效的方法,例如,用于非凸起问题的理想收敛性,例如,避免对局部最小值的收敛和收敛。在实心测量社区中的传统智慧是,第2顺序方法的应用,即雇用Hessian以及梯度信息的人可以是高度的。因此,第一阶算法(例如随机梯度(SGD))一直处于求解这种机器学习问题的中心阶段。在这里,我们的目的是通过考虑牛顿方法,即亚抽样区域和立方规范化的考虑效率和随机变体来解决这种误解,其理论会聚属性最近在[徐2017]中建立。利用各种实验,借着借毒评估这些方法解决非凸法学习应用的方法。在这样做时,我们突出了1阶方法的缺点,例如,对鞍点附近的STEP-SIZEAND不良行为等高敏感性,并展示了将曲率信息的优势作为有效的补救措施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号