Approximate Solutions to Factored Markov Decision Processes via Greedy Search in the Space of Finite State Controllers

机译：有限状态控制器空间中基于贪婪搜索的因式马尔可夫决策过程的近似解

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In stochastic planning problems formulated as factored Markov decision processes (MDPs), also called dynamic belief network MDPs (DBN-MDPs) (Boutilier, Dean, & Hanks 1999), finding the best policy (or conditional plan) is NP-hard. One of the difficulties comes from the fact that the number of conditionals required to specify the policy can grow to be exponential in the size of the representation for the MDP. Several recent algorithms have focused on finding an approximate policy by restricting the representation of conditionals using decision trees. We propose an alternative policy representation for Factored MDPs in terms of finite-state machine (FSM) controllers. Since practically speaking we are forced to limit the number of conditionals, we claim that there is a benefit to be had in using FSM controllers given that these controllers can use their internal state to maintain context information that might otherwise require a large conditional table or decision tree. Although the optimal policy might not be representable as a finite-state controller with a fixed amount of memory, we will be satisfied with finding a "good" policy; to that end, we derive a stochastic greedy-search algorithm based on recent developments in reinforcement learning (Baird & Moore 1999) and then demonstrate its performance in some example domains.

机译：在由因子马尔可夫决策过程（MDP）制定的随机计划问题中，也称为动态信念网络MDP（DBN-MDP）（Boutilier，Dean，＆Hanks 1999），找到最佳策略（或有条件的计划）是NP难的。困难之一来自这样一个事实，即指定策略所需的条件数量可能会增长到MDP表示形式的指数级。最近的几种算法集中在通过使用决策树限制条件表示来寻找近似策略。我们根据有限状态机（FSM）控制器，提出了因式MDP的替代策略表示形式。由于实际上说来我们被迫限制条件的数量，因此我们声称使用FSM控制器是有好处的，因为这些控制器可以使用其内部状态来维护上下文信息，否则它们可能需要大的条件表或决策树。尽管最佳策略可能无法表示为具有固定内存量的有限状态控制器，但我们对找到“良好”策略感到满意；为此，我们基于强化学习的最新发展（Baird＆Moore 1999）推导了一种随机贪婪搜索算法，然后在某些示例领域中证明了其性能。

著录项

来源
《International Conference on Artificial Intelligence Planning and Scheduling; 2000414-17; Breckenridge,CO(US)》|2000年|P.323-330|共8页
会议地点 BreckenridgeCO(US)
作者
Kee-Eung Kim; Thomas L. Dean; Nicolas Meuleau;
展开▼
作者单位

Department of Computer Science Brown University Providence, RI 02912;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space [J] . Rolando Cavazos-Cadena Mathematical methods of operations research . 2003,第2期

机译：具有状态空间的一类马尔可夫决策过程中风险敏感的平均成本最优方程的求解
2. A PERTURBATION APPROACH TO APPROXIMATE VALUE ITERATION FOR AVERAGE COST MARKOV DECISION PROCESSES WITH BOREL SPACES AND BOUNDED COSTS [J] . Vega-Amaya Oscar, Lopez-Borbon Joaqun Kybernetika . 2019,第1期

机译：具有BOREL空间和绑定成本的平均成本MARKOV决策过程的近似值迭代的扰动方法
3. A PERTURBATION APPROACH TO APPROXIMATE VALUE ITERATION FOR AVERAGE COST MARKOV DECISION PROCESSES WITH BOREL SPACES AND BOUNDED COSTS [J] . Vega-Amaya Oscar, Lopez-Borbon Joaqun Kybernetika . 2019,第1期

机译：具有Borel空间和界限成本的平均成本马尔可夫决策过程近似值迭代的扰动方法
4. Approximate solutions to factored Markov decision processes via greedy search in the space of finite state controllers [C] . Kee-Eung Kim, Thomas L. Dean, Nicolas Meuleau International Conference on Artificial Intelligence Planning and Scheduling . 2000

机译：通过贪婪搜索在有限状态控制器的空间中通过贪婪搜索近似解
5. Approximate solutions to Markov decision processes [D] . Gordon, Geoffrey J. 1999

机译：马尔可夫决策过程的近似解决方案
6. A Subspace Pursuit–based Iterative Greedy Hierarchical Solution to the Neuromagnetic Inverse Problem [O] . Behtash Babadi, Gabriel Obregon-Henao, Camilo Lamus, -1

机译：基于子空间追求的神经磁逆问题的迭代贪婪层次解
7. A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs [O] . Óscar Vega-Amaya, Joaquín López-Borbón 2019

机译：具有Borel空间和界限成本的平均成本马尔可夫决策过程近似值迭代的扰动方法

Approximate Solutions to Factored Markov Decision Processes via Greedy Search in the Space of Finite State Controllers

摘要

著录项

相似文献

相关主题

期刊订阅