...
首页> 外文期刊>SIAM Journal on Control and Optimization >STOCHASTIC NASH EQUILIBRIUM SEEKING FOR GAMES WITH GENERAL NONLINEAR PAYOFFS
【24h】

STOCHASTIC NASH EQUILIBRIUM SEEKING FOR GAMES WITH GENERAL NONLINEAR PAYOFFS

机译:具有一般非线性支出的游戏的随机NASH平衡寻求

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We introduce a multi-input stochastic extremum seeking algorithm to solve the problem of seeking Nash equilibria for a noncooperative game whose N players seek to maximize their individual payoff functions. The payoff functions are general (not necessarily quadratic), and their forms are not known to the players. Our algorithm is a nonmodel-based approach for asymptotic attainment of the Nash equilibria. Different from classical game theory algorithms, where each player employs the knowledge of the functional form of his payoff and the knowledge of the other players' actions, a player employing our algorithm measures only his own payoff values, without knowing the functional form of his or other players' payoff functions. We prove local exponential (in probability) convergence of our algorithms. For nonquadratic payoffs, the convergence is not necessarily perfect but may be biased in proportion to the third derivatives of the payoff functions and the intensity of the stochastic perturbations used in the algorithm. We quantify the size of these residual biases. Compared to the deterministic extremum seeking with sinusoidal perturbation signals, where convergence occurs only if the players use distinct frequencies, in our algorithm each player simply employs an independent ergodic stochastic probing signal in his seeking strategy, which is realistic in noncooperative games. As a special case of an N-player noncooperative game, the problem of standard multivariable optimization (when the players' payoffs coincide) for quadratic maps is also solved using our stochastic extremum seeking algorithm.
机译:我们引入了一种多输入随机极值搜索算法,以解决非合作游戏的Nash均衡寻求问题,该游戏的N个参与者寻求最大程度地发挥各自的收益功能。收益函数是通用的(不一定是二次函数),其形式也不为玩家所知。我们的算法是一种基于非模型的方法,可实现Nash均衡的渐近性。与经典博弈论算法不同,在经典博弈论算法中,每个玩家都使用自己的收益功能形式的知识以及其他玩家的行为的知识,采用我们算法的玩家只测量自己的收益值,而并不知道自己或他人的功能形式其他玩家的收益功能。我们证明了算法的局部指数(概率)收敛性。对于非二次收益,收敛不一定是完美的,但可能会与收益函数的三阶导数和算法中使用的随机扰动的强度成比例地产生偏差。我们量化这些残余偏差的大小。与使用正弦扰动信号进行确定性极值搜索相比,只有当玩家使用不同的频率时才会发生收敛,在我们的算法中,每个玩家在其搜索策略中仅采用独立的遍历式随机探测信号,这在非合作游戏中是现实的。作为N玩家非合作游戏的一种特殊情况,使用我们的随机极值搜索算法也可以解决二次映射的标准多变量优化问题(当玩家的收益一致时)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号