...
首页> 外文期刊>Reliability Engineering & System Safety >Repeated Stackelberg security games: Learning with incomplete state information
【24h】

Repeated Stackelberg security games: Learning with incomplete state information

机译:重复的Stackelberg安全游戏:学习不完整的状态信息

获取原文
获取原文并翻译 | 示例
           

摘要

Existing applications of Stackelberg Security Games (SSGs) have make use of Reinforcement Learning (RL) approaches to learn and adapt defenders-attackers behavior. The learning process for defenders-attackers is represented by randomized strategies for the defenders applied against adversarial strategies of the attackers, which acquire feedback on their strategies observing the target that was defended-attacked. However, must of the existing SSGs RL models feature strong assumptions including that the defenders and attackers have perfect information about the behavioral model, producing inconsistencies.We address these problems proposing a practical framework for representing real-world security problems by empowering SSGs with a RI. approach considering incomplete state information. The players' behavior and rationality are restricted to a class of partially observed Markov games (POMG). We develop an algorithm that consider randomized strategies for both defenders and attackers and obtain feedback on their partially observed states. We propose adaptive rules for computing the estimated transition matrices and utilities considering the number of unobserved experiences in the game. Furthermore, we study the problems of convergence of the estimated transition matrices and utilities in SSGs. For the realization of the SSG, we propose a new partially observed random walk technique for the randomization in the scheduling of the patrol planning. Results are applied to security games between defenders and attackers, where the noncooperative behaviors are well characterized by the features of the learning process in Stackelberg games.
机译:Stackelberg安全游戏(SSG)的现有应用程序已经利用强化学习(RL)方法来学习和适应防御者-攻击者的行为。防御者-攻击者的学习过程以针对攻击者的对抗策略的防御者随机策略为代表,这些策略会获取有关其策略的反馈,这些策略会观察被防御者攻击的目标。但是,现有SSG的RL模型必须具有强大的假设,包括防御者和攻击者拥有有关行为模型的完美信息,从而产生不一致之处。我们针对这些问题提出了一个实用的框架,可以通过使用RI赋予SSG来代表现实世界的安全问题。考虑不完整状态信息的方法。玩家的行为和理性仅限于一类部分观察的马尔可夫游戏(POMG)。我们开发了一种算法,该算法考虑防御者和攻击者的随机策略,并获得有关其部分观察状态的反馈。考虑到游戏中未观察到的经验,我们提出了自适应规则来计算估计的过渡矩阵和效用。此外,我们研究了SSG中估计的过渡矩阵和效用的收敛性问题。为了实现SSG,我们提出了一种新的部分观察的随机游走技术,用于巡逻计划的调度中的随机化。结果应用于防御者和攻击者之间的安全游戏,其中非合作行为通过Stackelberg游戏中学习过程的特征来很好地表征。

著录项

  • 来源
    《Reliability Engineering & System Safety》 |2020年第3期|106695.1-106695.12|共12页
  • 作者

  • 作者单位

    Univ Anahuac Ctr Alta Direcc Ingn & Tecnol Av Univ Anahuac 46 Lomas Anahuac 50130 Edo Mexico Mexico;

    Inst Politecn Nacl Escuela Super Fis & Matemat Bldg 9 Av Inst Politecn Nacl Mexico City 07738 DF Mexico|Natl Polytech Inst Sch Phys & Math Mexico City 07738 DF Mexico;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Reinforcement learning; Incomplete information; Security games;

    机译:强化学习;信息不完整;安全游戏;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号