Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor

Shiyao DING; Toshimitsu USHIO

首页> 外文期刊>電子情報通信学会技術研究報告. システム数理と応用. Mathematical Systems Science and its Applications >Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor

【24h】

Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor

机译：通过策略梯度滞后锚定，在双人矩阵游戏中学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a novel multi-agent reinforcement learning (MARL) algorithm which is called a policy gradient lagging anchor (PGLA) algorithm. Then, we consider 2 two-player matrix games as illustrative examples. And it is shown by simulation that behaviors of the games using the PGLA algorithm can converge to Nash equilibria in both pure and mixed policies.

机译：我们提出了一种新型多功能加强学习（MARL）算法，称为策略梯度滞后锚（PGLA）算法。然后，我们认为2个双人矩阵游戏作为说明性示例。并且通过模拟显示使用PGLA算法的游戏的行为可以收敛到纯粹和混合策略中的纳什均衡。

著录项

来源
《電子情報通信学会技術研究報告. システム数理と応用. Mathematical Systems Science and its Applications》 |2017年第506期|共4页
作者
Shiyao DING; Toshimitsu USHIO;
展开▼
作者单位

Graduate School of Engineering Science Osaka University;

Graduate School of Engineering Science Osaka University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算机软件;
关键词
Reinforcement Learning; Policy Gradient; Multi-Agent Systems; Matrix Game;

机译：强化学习;政策梯度;多智能经纪系统;矩阵游戏;

相似文献

外文文献
中文文献
专利

1. Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor [J] . Shiyao DING, Toshimitsu USHIO 電子情報通信学会技術研究報告. システム数理と応用. Mathematical Systems Science and its Applications . 2017,第506期

机译：通过策略梯度滞后锚定，在双人矩阵游戏中学习
2. Policy Gradient Lagging Anchor for Concurrent Games [J] . Shiyao DING, Toshimitsu USHIO 電子情報通信学会技術研究報告. 信号処理. Signal Processing . 2019,第155期

机译：并发游戏的政策渐变滞后锚
3. Policy Gradient Lagging Anchor for Concurrent Games [J] . Shiyao DING, Toshimitsu USHIO 電子情報通信学会技術研究報告. システム数理と応用. Mathematical Systems Science and its Applications . 2019,第156期

机译：并发游戏的政策渐变滞后锚
4. Decentralized learning in two-player zero-sum games: A LR-I lagging anchor algorithm [C] . Lu Xiaosong, Schwartz Howard M. 2011 American Control Conference . 2011

机译：两人零和游戏的分散学习：一种L R-I 滞后锚算法
5. Deception in two-player zero-sum stochastic games: Theory and application to warfare games. [D] . Singh, Rajdeep. 2006

机译：两人零和随机游戏中的欺骗：理论和在战争游戏中的应用。
6. Spike-based Decision Learning of Nash Equilibria in Two-Player Games [O] . Johannes Friedrich, Walter Senn 2012

机译：两人游戏中基于纳什均衡的基于峰值的决策学习
7. Online Gaming: Real Time Solution of Nonlinear Two-Player Zero-Sum Games Using Synchronous Policy Iteration [O] . Kyriakos G., Frank L. 2011

机译：在线游戏：使用同步策略迭代的非线性双人零和游戏的实时解决方案

Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor

摘要

著录项

相似文献

相关主题

期刊订阅