...
首页> 外文期刊>ACM transactions on autonomous and adaptive systems >Reinforcement Learning of Informed Initial Policies for Decentralized Planning
【24h】

Reinforcement Learning of Informed Initial Policies for Decentralized Planning

机译:加强对分散计划的知情初始政策的学习

获取原文
获取原文并翻译 | 示例
           

摘要

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, as well as local information. Prevalent solution techniques are centralized and model based-limitations that we address by distributed reinforcement learning (RL). We particularly favor alternate learning, where agents alternately learn best responses to each other, which appears to outperform concurrent RL. However, alternate learning requires an initial policy. We propose two principled approaches to generating informed initial policies: a naive approach that lays the foundation for a more sophisticated approach. We empirically demonstrate that the refined approach produces near-optimal solutions in many challenging benchmark settings, staking a claim to being an efficient (and realistic) approximate solver in its own right. Furthermore, alternate best response learning seeded with such policies quickly learns high-quality policies as well.
机译:分散的部分可观察的马尔可夫决策过程(Dec-POMDP)为合作多代理系统中的代理提供了一个正式的计划模型,在该系统中,代理使用嘈杂的传感器和执行器以及本地信息进行操作。普遍的解决方案技术是集中式的和基于模型的限制,我们通过分布式强化学习(RL)解决了这些限制。我们特别喜欢交替学习,即代理交替学习彼此的最佳响应,这似乎胜过并发RL。但是,替代学习需要初步的政策。我们提出了两种原则性的方法来生成明智的初始政策:一种幼稚的方法为更复杂的方法奠定了基础。我们凭经验证明,改进的方法可以在许多具有挑战性的基准设置中产生接近最佳的解决方案,并声称自己是一种有效的(逼真的)近似求解器。此外,以此类策略为基础的替代最佳响应学习也可以快速学习高质量策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号