首页> 外文学位 >Approximate Policy Iteration Algorithms for Continuous, Multidimensional Applications and Convergence Analysis.

【24h】

Approximate Policy Iteration Algorithms for Continuous, Multidimensional Applications and Convergence Analysis.

机译：连续，多维应用程序和收敛性分析的近似策略迭代算法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The purpose of this dissertation is to present parametric and non-parametric policy iteration algorithms that handle Markov decision process problems with high-dimensional, continuous state and action spaces and to conduct convergence analysis of these algorithms under a variety of technical conditions. An online, on-policy least-squares policy iteration (LSPI) algorithm is proposed, which can be applied to infinite horizon problems with where states and controls are vector-valued and continuous. No special problem structure such as linear, additive noise is assumed, and the expectation is assumably uncomputable. The concept of the post-decision state variable is used to eliminate the expectation inside the optimization problem, and a formal convergence analysis of the algorithm is provided under the assumption that value functions are spanned by finitely many known basis functions. Furthermore, the convergence result extends to the more general case of unknown value function form using orthogonal polynomial approximation. Using kernel smoothing techniques, this dissertation presents three different online, on-policy approximate policy iteration algorithms which can be applied to infinite horizon problems with continuous and high-dimensional state and action spaces. They are kernel-based least squares approximate policy iteration, approximate policy iteration with kernel smoothing and policy iteration with finite horizon approximation and kernel estimators. The use of Monte Carlo sampling to estimate the value function around the post-decision state reduces the problem to a sequence of deterministic, nonlinear programming problems that allow the algorithms to handle continuous, vector-valued states and actions. Again, a formal convergence analysis of the algorithms under a variety of technical assumptions is presented. The algorithms are applied to different numerical applications including linear quadratic regulation, wind energy allocation and battery storage problems to demonstrate their effectiveness and convergence properties.

机译：本文的目的是提出处理高维，连续状态和动作空间的马尔可夫决策过程问题的参数和非参数策略迭代算法，并在各种技术条件下对这些算法进行收敛性分析。提出了一种在线的，基于策略的最小二乘策略迭代（LSPI）算法，该算法可以应用于状态和控制为矢量值且连续的无限地平线问题。没有假定特殊的问题结构，例如线性，加性噪声，并且期望值是无可辩驳的。决策后状态变量的概念用于消除优化问题中的期望，并在假设值函数被有限的多个已知基函数跨越的前提下，对该算法进行了形式收敛分析。此外，收敛结果扩展到使用正交多项式逼近的未知值函数形式的更一般情况。本文使用核平滑技术，提出了三种不同的在线，基于策略的近似策略迭代算法，这些算法可以应用于具有连续和高维状态和动作空间的无限视野问题。它们是基于内核的最小二乘近似策略迭代，具有内核平滑的近似策略迭代和具有有限层近似和内核估计器的策略迭代。使用蒙特卡洛采样法来估计决策后状态周围的值函数，将问题简化为一系列确定性的非线性编程问题，这些问题使算法可以处理连续的矢量值状态和动作。再次，提出了在各种技术假设下对算法的形式化收敛分析。该算法被应用于不同的数值应用，包括线性二次调节，风能分配和电池存储问题，以证明其有效性和收敛性。

著录项

作者
Ma, Jun.;
展开▼
作者单位

Princeton University.;

展开▼
授予单位 Princeton University.;
学科 Business Administration Management.;Operations Research.;Engineering System Science.
学位 Ph.D.
年度 2011
页码 161 p.
总页数 161
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications [J] . Warren B. POWELL, Jun MA 控制理论与应用（英文版） . 2011,第003期

机译：具有连续值函数逼近的随机算法及其在多维连续应用中的一些新的近似策略迭代算法的综述
2. Convergence to approximate solutions and perturbation resilience of iterative algorithms [J] . Reich Simeon, Zaslavski Alexander J. Inverse Problems: An International Journal of Inverse Problems, Inverse Methods and Computerised Inversion of Data . 2017,第4期

机译：迭代算法的近似解决方案和扰动性的融合
3. APPROXIMATING COMMON SOLUTIONS OF VARIATIONAL INEQUALITIES BY ITERATIVE ALGORITHMS WITH APPLICATIONS [J] . Xiaolong Qin, Sun Young Cho, Yeol Je Cho Glasnik Matematicki . 2011,第1期

机译：用迭代算法逼近变分不等式的常见解
4. A convergent recursive least squares approximate policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces [C] . Jun Ma, Powell W.B. Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09 . 2009

机译：具有连续状态和动作空间的多维马尔可夫决策过程的收敛递归最小二乘近似策略迭代算法
5. Energy Storage Applications of the Knowledge Gradient for Calibrating Continuous Parameters, Approximate Policy Iteration using Bellman Error Minimization with Instrumental Variables, and Covariance Matrix Estimation using an Errors-in-Variables Factor Model. [D] . Scott, Warren Robert. 2012

机译：知识梯度的能量存储应用，用于校准连续参数，使用带工具变量的Bellman误差最小化进行近似策略迭代以及使用可变误差因子模型进行协方差矩阵估计。
6. Hyperspectral chemical plume detection algorithms based on multidimensional iterative filtering decomposition [O] . A. Cicone, J. Liu, H. Zhou -1

机译：基于多维迭代滤波分解的高光谱化学羽流检测算法
7. A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications [O] . Warren B. Powell, Jun Ma 2011

机译：具有连续值函数逼近的随机算法及其在多维连续应用中的一些新的近似策略迭代算法的综述
8. On improving the iterative convergence properties of an implicit approximate-factorization finite difference algorithm [R] . Desideri, J. A., Steger, J. L., Tannehill, J. C. 1978

机译：关于改进隐式近似因子分解有限差分算法的迭代收敛性

Approximate Policy Iteration Algorithms for Continuous, Multidimensional Applications and Convergence Analysis.

摘要

著录项

相似文献

相关主题

期刊订阅