A robust policy bootstrapping algorithm for multi-objective reinforcement learning in non-stationary environments

Abdelfattah Sherif; Kasmarik Kathryn; Hu Jiankun

首页> 外文期刊>Adaptive Behavior >A robust policy bootstrapping algorithm for multi-objective reinforcement learning in non-stationary environments

【24h】

A robust policy bootstrapping algorithm for multi-objective reinforcement learning in non-stationary environments

机译：非静止环境中多目标强力学习的强大策略自动启动算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multi-objective Markov decision processes are a special kind of multi-objective optimization problem that involves sequential decision making while satisfying the Markov property of stochastic processes. Multi-objective reinforcement learning methods address this kind of problem by fusing the reinforcement learning paradigm with multi-objective optimization techniques. One major drawback of these methods is the lack of adaptability to non-stationary dynamics in the environment. This is because they adopt optimization procedures that assume stationarity in order to evolve a coverage set of policies that can solve the problem. This article introduces a developmental optimization approach that can evolve the policy coverage set while exploring the preference space over the defined objectives in an online manner. We propose a novel multi-objective reinforcement learning algorithm that can robustly evolve a convex coverage set of policies in an online manner in non-stationary environments. We compare the proposed algorithm with two state-of-the-art multi-objective reinforcement learning algorithms in stationary and non-stationary environments. Results showed that the proposed algorithm significantly outperforms the existing algorithms in non-stationary environments while achieving comparable results in stationary environments.

机译：多目标马尔可夫决策过程是一种特殊的多目标优化问题，涉及顺序决策，同时满足随机过程的马尔可夫属性。多目标强化学习方法通过利用多目标优化技术融合强化学习范式来解决这种问题。这些方法的一个主要缺点是缺乏对环境中非静止动力的适应性。这是因为它们采用了假设实质性的优化程序，以便演变可以解决问题的覆盖策略集。本文介绍了一种发展优化方法，可以在以在线方式探索定义目标的偏好空间的同时发展策略覆盖集。我们提出了一种新型的多目标强化学习算法，可以在非静止环境中以在线方式强大地发展凸覆盖策略集。我们将所提出的算法与静止和非静止环境中的两个最先进的多目标强力学习算法进行比较。结果表明，该算法在非静止环境中显着优于现有的算法，同时在静止环境中实现可比结果。

著录项

来源
《Adaptive Behavior》 |2020年第4期|273-292|共20页
作者
Abdelfattah Sherif; Kasmarik Kathryn; Hu Jiankun;
展开▼
作者单位

UNSW Canberra Sch Engn & Informat Technol Northcott Dr Canberra ACT 2612 Australia;

UNSW Canberra Sch Engn & Informat Technol Northcott Dr Canberra ACT 2612 Australia;

UNSW Canberra Sch Engn & Informat Technol Northcott Dr Canberra ACT 2612 Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Multi-objective optimization; reinforcement learning; non-stationary; environment; dynamics; policy bootstrapping; Markov decision processes;

机译：多目标优化;加固学习;非静止;环境;动态;政策举动;马尔可夫决策过程;

相似文献

外文文献
中文文献
专利

1. Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems [J] . Koulouriotis DE, Xanthopoulos A Applied mathematics and computation . 2008,第2期

机译：非平稳多臂土匪问题的强化学习和进化算法
2. Learning adversarial attack policies through multi-objective reinforcement learning [J] . Javier Garcia, Ruben Majadas, Fernando Fernandez Engineering Applications of Artificial Intelligence . 2020,第Nova期

机译：通过多目标强化学习学习对抗性攻击政策
3. Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments [J] . Marinescu Andrei, Dusparic Ivana, Clarke Siobhan ACM transactions on autonomous and adaptive systems . 2017,第2期

机译：固有的非平稳环境中基于预测的多智能体强化学习
4. A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation [C] . Runzhe Yang, Xingyuan Sun, Karthik Narasimhan Conference on Neural Information Processing Systems . 2020

机译：一种多目标强化学习和政策适应的广义算法
5. On the convergence of model -free policy iteration algorithms for reinforcement learning: Stochastic approximation under discontinuous mean dynamics. [D] . Williams, John Kevin. 2000

机译：关于用于增强学习的无模型策略迭代算法的收敛：不连续平均动力学下的随机逼近。
6. Multi-Objective Control Optimization for Greenhouse Environment Using Evolutionary Algorithms [O] . Haigen Hu, Lihong Xu, Ruihua Wei, 2011

机译：基于进化算法的温室环境多目标控制优化
7. Reinforcement learning algorithm for non-stationary environments [O] . Sindhu Padakandla, Prabuchandran K. J., Shalabh Bhatnagar 2020

机译：用于非静止环境的加固学习算法

A robust policy bootstrapping algorithm for multi-objective reinforcement learning in non-stationary environments

摘要

著录项

相似文献

相关主题

期刊订阅