首页> 外国专利> OPTIMAL POLICY DETERMINATION USING REPEATED STACKELBERG GAMES WITH UNKNOWN PLAYER PREFERENCES

OPTIMAL POLICY DETERMINATION USING REPEATED STACKELBERG GAMES WITH UNKNOWN PLAYER PREFERENCES

机译:使用重复的STACKELBERG游戏和未知的玩家偏好来确定最佳策略

摘要

A system, method and computer program product for planning actions in a repeated Stackelberg Game, played for a fixed number of rounds, where the payoffs or preferences of the follower are initially unknown to the leader, and a prior probability distribution over follower types is available. In repeated Bayesian Stackelberg games, the objective is to maximize the leader's cumulative expected payoff over the rounds of the game. The optimal plans in such games make intelligent tradeoffs between actions that reveal information regarding the unknown follower preferences, and actions that aim for high immediate payoff. The method solves for such optimal plans according to a Monte Carlo Tree Search method wherein simulation trials draw instances of followers from said prior probability distribution. Some embodiments additionally implement a method for pruning dominated leader strategies.
机译:一种用于在重复的Stackelberg游戏中计划动作的系统,方法和计算机程序产品,玩了固定的回合数,其中领导者最初不了解跟随者的收益或偏好,并且可以根据跟随者类型获得先验概率分布。在重复的贝叶斯Stackelberg游戏中,目标是在游戏各回合中最大化领导者的累积预期收益。在此类游戏中,最佳计划会在显示有关未知追随者偏好信息的动作与旨在实现高即时收益的动作之间做出明智的权衡。该方法根据蒙特卡洛树搜索方法来解决这种最优计划,其中模拟试验从所述先验概率分布中得出跟随者的实例。一些实施例还实现了用于修剪主导者领导策略的方法。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号