Directed Exploration in Black-Box Optimization for Multi-Objective Reinforcement Learning

Garcia Javier; Iglesias Roberto; Rodriguez Miguel A.; Regueiro Carlos V

首页> 外文期刊>International Journal of Information Technology & Decision Making >Directed Exploration in Black-Box Optimization for Multi-Objective Reinforcement Learning

【24h】

Directed Exploration in Black-Box Optimization for Multi-Objective Reinforcement Learning

机译：用于多目标强化学习的黑匣子优化的定向探索

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Usually, real-world problems involve the optimization of multiple, possibly conflicting, objectives. These problems may be addressed by Multi-objective Reinforcement learning (MORL) techniques. MORL is a generalization of standard Reinforcement Learning (RL) where the single reward signal is extended to multiple signals, in particular, one for each objective. MORL is the process of learning policies that optimize multiple objectives simultaneously. In these problems, the use of directional/gradient information can be useful to guide the exploration to better and better behaviors. However, traditional policy-gradient approaches have two main drawbacks: they require the use of a batch of episodes to properly estimate the gradient information (reducing in this way the learning speed), and they use stochastic policies which could have a disastrous impact on the safety of the learning system. In this paper, we present a novel population-based MORL algorithm for problems in which the underlying objectives are reasonably smooth. It presents two main characteristics: fast computation of the gradient information for each objective through the use of neighboring solutions, and the use of this information to carry out a geometric partition of the search space and thus direct the exploration to promising areas. Finally, the algorithm is evaluated and compared to policy gradient MORL algorithms on different multi-objective problems: the water reservoir and the biped walking problem (the latter both on simulation and on a real robot).

机译：通常，现实世界问题涉及优化多重，可能相互冲突的目标。这些问题可以通过多目标强化学习（Morl）技术来解决。 Morl是标准加强学习（RL）的概括，其中单个奖励信号延伸到多个信号，特别是每个目标的信号。 Morl是学习策略的过程，即同时优化多个目标。在这些问题中，使用方向/梯度信息可能有助于指导探索更好，更好的行为。但是，传统的政策梯度方法具有两个主要缺点：它们需要使用一批剧集来正确估计梯度信息（以这种方式减少学习速度），并且它们使用随机策略对此产生灾难性影响学习系统的安全。在本文中，我们提出了一种新的基于人群的Morl算法，用于潜在目标合理流畅的问题。它提出了两个主要特征：通过使用相邻解决方案，快速计算每个目标的梯度信息，以及使用这些信息来执行搜索空间的几何分区，从而指导探索到有前景区域。最后，评估了该算法，并与不同多目标问题的政策梯度Morl算法进行了评估，水库和Biped行走问题（后者在模拟和真实机器人上）。

著录项

来源
《International Journal of Information Technology & Decision Making》 |2019年第3期|共38页
作者
Garcia Javier; Iglesias Roberto; Rodriguez Miguel A.; Regueiro Carlos V;
展开▼
作者单位

Univ Santiago de Compostela CiTIUS Santiago De Compostela Spain;

Univ Santiago de Compostela CiTIUS Santiago De Compostela Spain;

Univ Santiago de Compostela CiTIUS Santiago De Compostela Spain;

Univ A Coruna Dept Elect &

Syst La Coruna Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类信息与传播理论;
关键词
Reinforcement learning; multi-objective optimization; robotic tasks; policy search; black-box optimization;

机译：强化学习;多目标优化;机器人任务;政策搜索;黑匣子优化;

相似文献

外文文献
中文文献
专利

1. Directed Exploration in Black-Box Optimization for Multi-Objective Reinforcement Learning [J] . Garcia Javier, Iglesias Roberto, Rodriguez Miguel A., International Journal of Information Technology & Decision Making . 2019,第3期

机译：用于多目标强化学习的黑匣子优化的定向探索
2. Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework [J] . Mohamed A. Khamis, Walid Gomaa Engineering Applications of Artificial Intelligence . 2014,第mara期

机译：基于合作多智能体框架的交通信号控制自适应多目标强化学习与混合探索
3. Directed Exploration in Reinforcement Learning with Transferred Knowledge [J] . Timothy A. Mann, Yoonsuck Choe JMLR: Workshop and Conference Proceedings . 2012,第2012期

机译：知识转移强化学习的定向探索
4. Deep Reinforcement Learning via Past-Success Directed Exploration [C] . Xiaoming Liu, Zhixiong Xu, Lei Cao, AAAI Conference on Artificial Intelligence;Innovative Applications of Artificial Intelligence Conference;AAAI Symposium on Educational Advances in Artificial Intelligence . 2019

机译：通过过去成功的探索深度加强学习
5. Multi-objective black-box optimization and optimal design of CUSUM [D] . Ryu, Jong-hyun 2010

机译：CUSUM的多目标黑箱优化与优化设计
6. A Multi-Objective Approach for Optimal Energy Management in Smart Home Using the Reinforcement Learning [O] . Muhammad Diyan, Bhagya Nathali Silva, Kijun Han 2020

机译：基于强化学习的智能家居最佳能源管理多目标方法
7. Robust Model-free Reinforcement Learning with Multi-objective Bayesian Optimization [O] . Matteo Turchetta, Andreas Krause, Sebastian Trimpe 2020

机译：无目标贝叶斯优化的无稳压增强学习

Directed Exploration in Black-Box Optimization for Multi-Objective Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅