...
首页> 外文期刊>Autonomous agents and multi-agent systems >Exploration in policy optimization through multiple paths
【24h】

Exploration in policy optimization through multiple paths

机译:多路径探讨政策优化

获取原文
获取原文并翻译 | 示例
           

摘要

Recent years have witnessed a tremendous improvement of deep reinforcement learning. However, a challenging problem is that an agent may suffer from inefficient exploration, particularly for on-policy methods. Previous exploration methods either rely on complex structure to estimate the novelty of states, or incur sensitive hyper-parameters causing instability. We propose an efficient exploration method, Multi-Path Policy Optimization (MP-PO), which does not incur high computation cost and ensures stability. MP-PO maintains an efficient mechanism that effectively utilizes a population of diverse policies to enable better exploration, especially in sparse environments. We also give a theoretical guarantee of the stable performance. We build our scheme upon two widely-adopted on-policy methods, the Trust-Region Policy Optimization algorithm and Proximal Policy Optimization algorithm. We conduct extensive experiments on several MuJoCo tasks and their sparsified variants to fairly evaluate the proposed method. Results show that MP-PO significantly outperforms state-of-the-art exploration methods in terms of both sample efficiency and final performance.
机译:近年来,巨大改善了深度加强学习。然而,具有挑战性的问题是代理人可能遭受效率低下的探索,特别是对于政策方法。以前的探索方法依靠复杂的结构来估计状态的新颖性,或导致不稳定的敏感超参数。我们提出了一种有效的探索方法,多路径策略优化(MP-PO),其不会产生高计算成本并确保稳定性。 MP-PO维护了一种有效的机制,有效利用各种政策群体来实现更好的探索,尤其是在稀疏环境中。我们还提供了稳定表现的理论保证。我们以两种广泛采用的策略方法,信任区域策略优化算法和近端策略优化算法构建了我们的计划。我们对几个Mujoco任务进行了广泛的实验及其疏散的变体,以公平地评估所提出的方法。结果表明,在样品效率和最终性能方面,MP-PO显着优于最先进的勘探方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号