【24h】

Informed Initial Policies for Learning in Dec-POMDPs

机译:Dec-POMDP中知情的初步学习政策

获取原文

摘要

We have presented a simple, principled technique to compute a valid Dec-POMDP policy for use as an initial policy in conjunction with reinforcement learning. Furthermore, we have demonstrated how to learn such policies in a model-free manner, and we have shown for two benchmark problems that using these initial policies can improve the outcome of alternating Q-learning. This result is encouraging and suggests that this initial policy may be useful in other algorithms (both existing and future) that might require an initial policy.
机译:我们提出了一种简单的,有原则的技术来计算有效的Dec-POMDP策略,以与强化学习一起用作初始策略。此外,我们演示了如何以无模型的方式学习此类策略,并且针对两个基准问题显示了使用这些初始策略可以改善交替Q学习的结果。该结果令人鼓舞,并表明该初始策略可能在可能需要初始策略的其他算法(现有算法和将来算法)中很有用。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号