Reinforcement Learning of Informed Initial Policies for Decentralized Planning

LANDON KRAEMER; BIKRAMJIT BANERJEE

首页> 外文期刊>ACM transactions on autonomous and adaptive systems >Reinforcement Learning of Informed Initial Policies for Decentralized Planning

【24h】

Reinforcement Learning of Informed Initial Policies for Decentralized Planning

机译：加强对分散计划的知情初始政策的学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, as well as local information. Prevalent solution techniques are centralized and model based-limitations that we address by distributed reinforcement learning (RL). We particularly favor alternate learning, where agents alternately learn best responses to each other, which appears to outperform concurrent RL. However, alternate learning requires an initial policy. We propose two principled approaches to generating informed initial policies: a naive approach that lays the foundation for a more sophisticated approach. We empirically demonstrate that the refined approach produces near-optimal solutions in many challenging benchmark settings, staking a claim to being an efficient (and realistic) approximate solver in its own right. Furthermore, alternate best response learning seeded with such policies quickly learns high-quality policies as well.

机译：分散的部分可观察的马尔可夫决策过程（Dec-POMDP）为合作多代理系统中的代理提供了一个正式的计划模型，在该系统中，代理使用嘈杂的传感器和执行器以及本地信息进行操作。普遍的解决方案技术是集中式的和基于模型的限制，我们通过分布式强化学习（RL）解决了这些限制。我们特别喜欢交替学习，即代理交替学习彼此的最佳响应，这似乎胜过并发RL。但是，替代学习需要初步的政策。我们提出了两种原则性的方法来生成明智的初始政策：一种幼稚的方法为更复杂的方法奠定了基础。我们凭经验证明，改进的方法可以在许多具有挑战性的基准设置中产生接近最佳的解决方案，并声称自己是一种有效的（逼真的）近似求解器。此外，以此类策略为基础的替代最佳响应学习也可以快速学习高质量策略。

著录项

来源
《ACM transactions on autonomous and adaptive systems》 |2015年第4期|18.1-18.32|共32页
作者
LANDON KRAEMER; BIKRAMJIT BANERJEE;
展开▼
作者单位

School of Computing, University of Southern Mississippi, 118 College Drive #5106, Hattiesburg, Mississippi 39406-0001;

School of Computing, University of Southern Mississippi, 118 College Drive #5106, Hattiesburg, Mississippi 39406-0001;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Decentralized partially observable Markov decision processes; multiagent reinforcement learning;

机译：分散的可观察的马尔可夫决策过程;多主体强化学习;

相似文献

外文文献
中文文献
专利

1. Decentralized learning of energy optimal production policies using PLC-informed reinforcement learning [J] . Dorothea Schwung, Steve Yuwono, Andreas Schwung, Computers & Chemical Engineering . 2021,第Sepa期

机译：分散学习能源最优生产政策使用PLC知识的强化学习
2. Multi-agent reinforcement learning as a rehearsal for decentralized planning [J] . Kraemer Landon, Banerjee Bikramjit Neurocomputing . 2016,第maya19期

机译：多主体强化学习作为分散计划的排练
3. Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies [J] . Tom Zahavy, Avinatan Hasidim, Haim Kaplan, JMLR: Workshop and Conference Proceedings . 2020,第4期

机译：分层强化学习中的计划：使用本地策略的保证
4. Informed Initial Policies for Learning in Dec-POMDPs [C] . Landon Kraemer, Bikramjit Banerjee IAAI-12;Innovative applications of artificial intelligence conference;AAAI conference on artificial intelligence;Symposium on educational advances in artificial intelligence;AAAI-12;EAAI-12 . 2012

机译：Dec-POMDP中知情的初步学习政策
5. Towards Informed Exploration for Deep Reinforcement Learning [D] . Tang, Haoran . 2019

机译：了解深度加强学习的知情探索
6. Creating windows of opportunity for policy change: Incorporating evidence into decentralized planning in Kenya. [O] . Lori S. Ashford, Rhonda R. Smith, Roger-Mark De Souza, 2006

机译：为政策变化创造机会之窗：将证据纳入肯尼亚的权力下放规划中。
7. Decentralized Motion Planning for Multi-Robot Navigation using Deep Reinforcement Learning [O] . K. Sivanathan, B. K. Vinayagam, Tanmay Samak, 2020

机译：利用深增强学习的多机器人导航分散运动规划

Reinforcement Learning of Informed Initial Policies for Decentralized Planning

摘要

著录项

相似文献

相关主题

期刊订阅