首页> 外文会议> >A Probabilistic Greedy Search Value Iteration Algorithm for POMDP
【24h】

A Probabilistic Greedy Search Value Iteration Algorithm for POMDP

机译:POMDP的概率贪婪搜索值迭代算法

获取原文

摘要

Point-based value iteration methods are a class of effective algorithms for solving POMDP model. Although MDP-based algorithms such as FSVI can reduce the complexity and improve efficiency greatly by using the optimal strategy of the underlying MDP, the excessive randomness of these algorithms makes them not suitable for the realistic POMDP problems. A probabilistic greedy search value iteration algorithm (PGSVI) is presented in the paper. PGSVI selects action according to the weighted reward, probabilistic greedy explores the state for the next horizon based on belief state and the transition function, then samples observation from observations whose observation probability is greater than a threshold. PGSVI makes up the shortage of FSVI algorithm and ensures the efficiency by selecting more rational actions, states and observations during the exploration. Experiment results of four benchmarks show that PGSVI is very competitive with FSVI in POMDP problems with large-scale observations.
机译:基于点的值迭代方法是解决POMDP模型的一类有效算法。尽管基于MDP的算法(例如FSVI)可以通过使用底层MDP的最佳策略来大大降低复杂度并提高效率,但是这些算法的过度随机性使其不适用于实际的POMDP问题。提出了一种概率贪婪搜索值迭代算法(PGSVI)。 PGSVI根据加权奖励选择操作,概率贪婪基于信念状态和过渡函数探索下一个地平线的状态,然后从观察概率大于阈值的观察中采样观察。 PGSVI弥补了FSVI算法的不足,并通过在勘探过程中选择更多合理的动作,状态和观测值来确保效率。四个基准测试的实验结果表明,在大规模观测的POMDP问题中,PGSVI与FSVI竞争非常激烈。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号