首页> 美国卫生研究院文献>other >Modeling and Planning with Macro-Actions in Decentralized POMDPs
【2h】

Modeling and Planning with Macro-Actions in Decentralized POMDPs

机译:在分散的POMDP中使用宏动作进行建模和计划

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for decentralized multi-agent decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent’s actions are primitive operations lasting exactly one time step. We address the case where each agent has macro-actions: temporally extended actions that may require different amounts of time to execute. We model macro-actions as options in a Dec-POMDP, focusing on actions that depend only on information directly available to the agent during execution. Therefore, we model systems where coordination decisions only occur at the level of deciding which macro-actions to execute. The core technical difficulty in this setting is that the options chosen by each agent no longer terminate at the same time. We extend three leading Dec-POMDP algorithms for policy generation to the macro-action case, and demonstrate their effectiveness in both standard benchmarks and a multi-robot coordination problem. The results show that our new algorithms retain agent coordination while allowing high-quality solutions to be generated for significantly longer horizons and larger state-spaces than previous Dec-POMDP methods. Furthermore, in the multi-robot domain, we show that, in contrast to most existing methods that are specialized to a particular problem class, our approach can synthesize control policies that exploit opportunities for coordination while balancing uncertainty, sensor information, and information about other agents.
机译:分散的部分可观察的马尔可夫决策过程(Dec-POMDPs)是不确定性下的分散多主体决策的一般模型。但是,他们通常以较低的粒度对问题进行建模,其中每个代理的操作都是仅持续一个时间步的原始操作。我们处理每个代理都有宏动作的情况:临时扩展的动作可能需要不同的时间才能执行。我们将宏操作建模为Dec-POMDP中的选项,重点放在仅依赖于执行期间直接可用于代理的信息的操作上。因此,我们对仅在决定要执行哪些宏动作的级别上发生协调决策的系统进行建模。此设置中的核心技术难题是每个代理选择的选项不再同时终止。我们将用于策略生成的三种领先的Dec-POMDP算法扩展到宏动作案例,并在标准基准测试和多机器人协调问题中展示了它们的有效性。结果表明,与以前的Dec-POMDP方法相比,我们的新算法在保持代理协调性的同时,可以为更长的视野和更大的状态空间生成高质量的解决方案。此外,在多机器人领域,我们证明,与大多数现有的专门用于特定问题类别的方法相比,我们的方法可以综合利用控制策略,在平衡不确定性,传感器信息和其他信息的同时利用协调机会。代理商。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号