首页> 美国卫生研究院文献>Frontiers in Neurorobotics >Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play

【2h】

Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play

机译：通过内在动机的自我博弈在多目标马尔可夫决策过程中发展稳健的政策覆盖范围

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many real-world decision-making problems involve multiple conflicting objectives that can not be optimized simultaneously without a compromise. Such problems are known as multi-objective Markov decision processes and they constitute a significant challenge for conventional single-objective reinforcement learning methods, especially when an optimal compromise cannot be determined beforehand. Multi-objective reinforcement learning methods address this challenge by finding an optimal coverage set of non-dominated policies that can satisfy any user's preference in solving the problem. However, this is achieved with costs of computational complexity, time consumption, and lack of adaptability to non-stationary environment dynamics. In order to address these limitations, there is a need for adaptive methods that can solve the problem in an online and robust manner. In this paper, we propose a novel developmental method that utilizes the adversarial self-play between an intrinsically motivated preference exploration component, and a policy coverage set optimization component that robustly evolves a convex coverage set of policies to solve the problem using preferences proposed by the former component. We show experimentally the effectiveness of the proposed method in comparison to state-of-the-art multi-objective reinforcement learning methods in stationary and non-stationary environments.

机译：许多现实世界中的决策问题涉及多个相互冲突的目标，这些目标不能同时妥协而无法同时优化。这些问题被称为多目标马尔可夫决策过程，它们对常规的单目标强化学习方法构成了重大挑战，尤其是在无法事先确定最佳折衷方案的情况下。多目标强化学习方法通过找到可以满足任何用户在解决问题上的偏好的非主导策略的最佳覆盖范围来解决此挑战。但是，这是通过计算复杂性，时间消耗和缺乏对非平稳环境动力学的适应性的成本来实现的。为了解决这些局限性，需要可以在线且可靠地解决问题的自适应方法。在本文中，我们提出了一种新颖的开发方法，该方法利用了内在动机的偏好探索组件和策略覆盖集优化组件之间的对抗性自我博弈，该组件可以稳健地演化出一个凸凸的策略覆盖集以使用由决策者提出的偏好来解决问题。前组件。与固定和非固定环境中最新的多目标强化学习方法相比，我们通过实验证明了所提出方法的有效性。

著录项

期刊名称 Frontiers in Neurorobotics
作者
Sherif Abdelfattah; Kathryn Kasmarik; Jiankun Hu;
展开▼
作者单位

展开▼
年(卷),期 2018(12),-1
年度 2018
页码 65
总页数 19
原文格式 PDF
正文语种
中图分类情报学;
关键词
multi-objective optimization intrinsic motivation adversarial self-play reinforcement learning Markov process decision making;

机译：多目标优化;内在动机;对抗性;自我游戏;强化学习;马尔可夫过程;决策;

相似文献

外文文献
中文文献
专利

1. Robust decomposable Markov decision processes motivated by allocating school budgets [J] . Dimitrov Nedialko B., Dimitrov Stanko, Chukova Stefanka European Journal of Operational Research . 2014,第1期

机译：分配学校预算的动力可分解的马尔可夫决策过程
2. Robust topological policy iteration for infinite horizon bounded Markov Decision Processes [J] . Silva Reis Willy Arthur, de Barros Leliane Nunes, Delgado Karina Valdivia 高分子論文集 . 2019,第FEBa期

机译：无限地平线有界Markov决策过程的鲁棒拓扑策略迭代
3. Policy iteration for robust nonstationary Markov decision processes [J] . Sinha Saumya, Ghate Archis Optimization Letters . 2016,第8期

机译：鲁棒非平稳马尔可夫决策过程的策略迭代
4. Evolving Policies for Multi-Reward Partially Observable Markov Decision Processes (MR-POMDPs) [C] . Harold Soh, Yiannis Demiris GECCO '11;Annual conference on genetic and evolutionary computation . 2012

机译：多奖励部分可观察的马尔可夫决策过程（MR-POMDP）的发展策略
5. Concurrent Markov Decision Processes for Robust Robot Team Learning under Uncertainty. [D] . Girard, Justin. 2014

机译：不确定条件下鲁棒机器人团队学习的并行马尔可夫决策过程。
6. Multi-Objective Markov Decision Processes for Data-Driven Decision Support [O] . Daniel J. Lizotte, Eric B. Laber -1

机译：数据驱动决策支持的多目标马尔可夫决策过程
7. Robust decomposable Markov decision processes motivated by allocating school budgets [O] . Dimitrov Nedialko B., Dimitrov Stanko, Chukova Stefanka 2014

机译：分配学校预算的动力可分解的马尔可夫决策过程
8. Two Short Notes on Markov Processes: I. A Test for Sub-Optimal Actions in Markovian Decision Problems. II. An Intrinsically Determined Markov Chain [R] . MacQueen, J. B. 1966

机译：关于马尔可夫过程的两个简短说明：I。马尔可夫决策问题中次优最优行动的检验。 II。本质上确定的马尔可夫链

Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play

摘要

著录项

相似文献

相关主题

期刊订阅