...
首页> 外文期刊>Quality Control, Transactions >A Semi-Markov Decision Model With Inverse Reinforcement Learning for Recognizing the Destination of a Maneuvering Agent in Real Time Strategy Games
【24h】

A Semi-Markov Decision Model With Inverse Reinforcement Learning for Recognizing the Destination of a Maneuvering Agent in Real Time Strategy Games

机译:具有反增强学习的半马尔可夫决策模型,用于识别实时战略游戏中的机动代理目的地

获取原文
获取原文并翻译 | 示例
           

摘要

Recognizing the destination of a maneuvering agent is important to create intelligent AI players in Real Time Strategy (RTS) games. Among different ways of problem formulation, goal recognition can be solved as a model-based planning problem using off-the-shelf planners. However, the common problem in these frameworks is that they usually lack of the modeling of the action duration as in real-world scenarios the agent may take several steps to transit between grids. To solve this problem, a semi-Markov decision model (SMDM), which explicitly models the duration of an action, is proposed in this paper. Besides, most of the current works do not establish a behavioral model of the identified person, and there is almost no work modeling individual behavioral preference, which limits the accuracy of the recognition results. In this paper, the Inverse Reinforcement Learning (IRL) method is adopted in opponent behavior learning for the destination recognition problem. To adapt to the dynamic environment, the Maximum Entropy Inverse Reinforcement Learning (MaxEnt IRL) method is transformed by defining a Fitness index to measure the effect of weight and use the Nelder-Mead polyhedron search to find the optimal weight. In experiments, we build the game scenario in the Unreal Engine 4 environment and collect the moving trajectories from the human players in several different tasks for evaluating the performance of our methods. The results show that the recognizer using IRL can recognize the destination effectively even if the intention changes during the midway, and it performs better than other models in terms of several most frequently-used metrics.
机译:认识到机动代理的目的地是在实时战略(RTS)游戏中创建智能AI玩家的重要性。在不同的问题制定方式中,使用现成的规划者可以解决目标识别作为基于模型的规划问题。然而,这些框架中的常见问题是,他们通常缺乏在真实世界中缺少动作持续时间的建模,代理可能需要几个步骤来在网格之间传输。为了解决这个问题,在本文中提出了一个半马尔可夫决策模型(SMDM),其明确地模拟了动作的持续时间。此外,大多数当前作品都不建立所确定的人的行为模型,并且几乎没有工作建模个人行为偏好,这限制了识别结果的准确性。本文采用了逆钢筋学习(IRL)方法对目的地识别问题的对手行为学习。为了适应动态环境,通过定义健身索引来测量重量的效果并使用Nelder-Mead PolyheDron搜索找到最佳重量来改变最大熵逆加强学习(MaxEnt IRL)方法。在实验中,我们在虚幻发动机4环境中建立游戏场景,并在几个不同任务中从人类玩家中收集移动轨迹,以评估我们的方法的性能。结果表明,即使在中途的意图变化,识别器也可以有效地识别目的地,并且在几种最常用的指标方面,它比其他模型更好地表现得比其他模型更好。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号