...
首页> 外文期刊>Transportation research >Markov-game modeling of cyclist-pedestrian interactions in shared spaces: A multi-agent adversarial inverse reinforcement learning approach
【24h】

Markov-game modeling of cyclist-pedestrian interactions in shared spaces: A multi-agent adversarial inverse reinforcement learning approach

机译:广播空间中骑自行车者行人互动的马尔可夫 - 游戏模型

获取原文
获取原文并翻译 | 示例
           

摘要

Understanding and modeling road user dynamics and their microscopic interaction behaviour at shared space facilities are curial for several applications including safety and performance evaluations. Despite the multi-agent nature of road user interactions, the majority of previous studies modeled their interactions as a single-agent modeling framework, i.e., considering the other interaction agents as part of the passive environment. However, this assumption is unrealistic and could limit the model's accuracy and transferability in non-stationary road user environments. This study proposes a novel Multi-Agent Adversarial Inverse Reinforcement Learning approach (MA-AIRL) to model and simulate road user interactions at shared space facilities. Unlike the traditional game-theoretic framework that models multi-agent systems as a single time-step payoff, the proposed approach is based on Markov Games (MG) which models road users' sequential decisions concurrently. Moreover, the proposed model can handle bounded rationality agents, e.g., limited information access, through the implementation of the Logistic Stochastic Best Response Equilibrium (LSBRE) solution concept. The proposed algorithm recovers road users' multi-agent reward functions using adversarial deep neural network discriminators and estimates their optimal policies using Multi-agent Actor-Critic with Kronecker factors (MACK) deep reinforcement learning. Data from three shared space locations in Vancouver, BC and New York City, New York are used in this study. The model's performance is compared to a baseline single-agent Gaussian Process Inverse Reinforcement Learning (GPIRL). The results show that the multi-agent modeling framework led to a significantly more accurate prediction of road users' behaviour and their evasive action mechanisms. Moreover, the recovered reward functions based on the single-agent modeling approach failed to capture the equilibrium solution concept similar to the multi-agent approach.
机译:在共享空间设施中了解和建模道路用户动力学及其微观交互行为是诸多应用的曲线,包括安​​全性和性能评估。尽管道路用户互动的多代理性质,但前面的大多数研究将它们的交互建模为单个代理模拟框架,即,考虑到作为被动环境的一部分的其他交互代理。然而,这种假设是不现实的,可以限制模型在非静止道路用户环境中的准确性和可转换性。本研究提出了一种新型多毒品对抗逆加强学习方法(MA-AIRL)来模拟和模拟共享空间设施的道路用户交互。与传统的游戏理论框架不同,模拟多代理系统作为单个时间步长的回报,所提出的方法是基于Markov Games(MG),它同时展示道路用户的顺序决策。此外,所提出的模型可以通过实施物流随机最佳响应均衡(LSBRE)解决方案概念来处理有界合理性代理,例如有限的信息访问。该算法使用对冲深神经网络鉴别器恢复了道路用户的多智能奖励功能,并估计了使用多档演员 - 评论家与克朗克师因子(Mack)深增强学习的最佳政策。来自Vancouver,BC和纽约市的三个共享空间位置的数据在本研究中使用。该模型的性能与基线单代理高斯工艺反增强学习(GPIRL)进行了比较。结果表明,多代理建模框架导致道路用户行为明显更准确地预测及其疏散动作机制。此外,基于单代理建模方法的恢复奖励函数未能捕获类似于多种代理方法的均衡解决方案概念。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号