【24h】

Representations of Decision-Theoretic Planning Tasks

机译:决策理论计划任务的表示

获取原文
获取原文并翻译 | 示例

摘要

Goal-directed Markov Decision Process models (GDMDPs) are good models for many decision-theoretic planning tasks. They have been used in conjunction with two different reward structures, namely the goal-reward representation and the action-penalty representation. We apply GDMDPs to planning tasks in the presence of traps such as steep slopes for outdoor robots or staircases for indoor robots, and study the differences between the two reward structures. In these situations, achieving the goal is often the primary objective while minimizing the travel time is only of secondary importance. We show that the action-penalty representation without discounting guarantees that the optimal plan achieves the goal for sure (if this is possible) but neither the action-penalty representation with discounting nor the goal-reward representation with discounting have this property. We then show exactly when this trapping phenomenon occurs, using a novel interpretation for discounting, namely that it models agents that use convex exponential utility functions and thus are optimistic in the face of uncertainty. Finally, we show how the trapping phenomenon can be eliminated with our Selective State-Deletion Method.
机译:目标导向的马尔可夫决策过程模型(GDMDP)是许多决策理论计划任务的良好模型。它们已与两种不同的奖励结构结合使用,即目标奖励表述和行动惩罚表述。我们将GDMDP应用到存在陷阱的计划任务中,例如室外机器人的陡坡或室内机器人的楼梯,并研究这两种奖励结构之间的差异。在这些情况下,实现目标通常是主要目标,而最大限度地减少旅行时间仅是次要的。我们表明,没有折扣的动作惩罚表示保证了最优计划可以肯定地实现目标(如果可能的话),但是带有折扣的动作惩罚表示和带有折扣的目标奖励表示都不具有此属性。然后,我们使用折现的新颖解释来确切地说明何时出现这种陷阱现象,即,它对使用凸指数效用函数的代理进行建模,从而面对不确定性时保持乐观。最后,我们展示了如何使用选择性状态删除方法消除陷阱现象。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号