Exploration in policy optimization through multiple paths

Pan Ling; Cai Qingpeng; Huang Longbo

首页> 外文期刊>Autonomous agents and multi-agent systems >Exploration in policy optimization through multiple paths

【24h】

Exploration in policy optimization through multiple paths

机译：多路径探讨政策优化

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent years have witnessed a tremendous improvement of deep reinforcement learning. However, a challenging problem is that an agent may suffer from inefficient exploration, particularly for on-policy methods. Previous exploration methods either rely on complex structure to estimate the novelty of states, or incur sensitive hyper-parameters causing instability. We propose an efficient exploration method, Multi-Path Policy Optimization (MP-PO), which does not incur high computation cost and ensures stability. MP-PO maintains an efficient mechanism that effectively utilizes a population of diverse policies to enable better exploration, especially in sparse environments. We also give a theoretical guarantee of the stable performance. We build our scheme upon two widely-adopted on-policy methods, the Trust-Region Policy Optimization algorithm and Proximal Policy Optimization algorithm. We conduct extensive experiments on several MuJoCo tasks and their sparsified variants to fairly evaluate the proposed method. Results show that MP-PO significantly outperforms state-of-the-art exploration methods in terms of both sample efficiency and final performance.

机译：近年来，巨大改善了深度加强学习。然而，具有挑战性的问题是代理人可能遭受效率低下的探索，特别是对于政策方法。以前的探索方法依靠复杂的结构来估计状态的新颖性，或导致不稳定的敏感超参数。我们提出了一种有效的探索方法，多路径策略优化（MP-PO），其不会产生高计算成本并确保稳定性。 MP-PO维护了一种有效的机制，有效利用各种政策群体来实现更好的探索，尤其是在稀疏环境中。我们还提供了稳定表现的理论保证。我们以两种广泛采用的策略方法，信任区域策略优化算法和近端策略优化算法构建了我们的计划。我们对几个Mujoco任务进行了广泛的实验及其疏散的变体，以公平地评估所提出的方法。结果表明，在样品效率和最终性能方面，MP-PO显着优于最先进的勘探方法。

著录项

来源
《Autonomous agents and multi-agent systems》 |2021年第2期|33.1-33.26|共26页
作者
Pan Ling; Cai Qingpeng; Huang Longbo;
展开▼
作者单位

Tsinghua Univ Beijing Peoples R China;

Tsinghua Univ Beijing Peoples R China;

Tsinghua Univ Beijing Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Deep reinforcement learning; Policy optimization;

机译：深增强学习;政策优化;

相似文献

外文文献
中文文献
专利

1. Adaptive exploration policy for exploration-exploitation tradeoff in continuous action control optimization [J] . Li Min, Huang Tianyi, Zhu William International journal of machine learning and cybernetics . 2021,第12期

机译：持续行动控制优化中勘探开发权衡的自适应探索政策
2. Multiple streams in member state implementation: politics, problem construction and policy paths in Swiss asylum policy [J] . Sager Fritz, Thomann Eva Journal of Public Policy . 2017,第3期

机译：成员国实施过程中的多种因素：瑞士庇护政策中的政治，问题建构和政策路径
3. Design Space Exploration and Optimization of Path Oblivious RAM in Secure Processors [J] . Ling Ren, Xiangyao Yu, Christopher W. Fletcher, Computer architecture news . 2013,第3期

机译：安全处理器中路径可忽略RAM的设计空间探索和优化
4. Explorations to "Students-oriented" Teaching of The Guide to English-Speaking Countries Constructed by Multiple Paths [C] . Yade Yang International Conference on Education, Economics, Social Science, Arts, Sports and Management Engineering . 2016

机译：探讨“以学生为导向”的英语国家指南教学，由多条路径构成
5. Multiple paths to first grade: A comparison of child, parent, and early education variables associated with multiple year kindergarten experiences. [D] . Denno, Dawn. 2010

机译：上一年级的多种途径：比较与多年幼儿园经历相关的孩子，父母和早期教育变量。
6. A new multiple robot path planning algorithm: dynamic distributed particle swarm optimization [O] . Asma Ayari, Sadok Bouamama -1

机译：一种新的多机器人路径规划算法：动态分布式粒子群算法
7. Design space exploration and optimization of path oblivious ram in secure processors [O] . Ling Ren, Xiangyao Yu, Christopher W. Fletcher, 2015

机译：安全处理器中路径遗忘ram的设计空间探索与优化

Exploration in policy optimization through multiple paths

摘要

著录项

相似文献

相关主题

期刊订阅