The development of multi-agent reinforcement learning in stochastic game has been slowed down in recent years.The main problem is that it is difficult to make the learning satisfy rationality and convergence at the same time.Here, the typical learning algorithms are analyzed firstly, and then a new method called Pareto-Q is prompted with the concept of Pareto optimum, which is rational.At the same time, social conventions are also introduced to promise the convergence of learning.At the last, experiments are presented to prove the good learning result of this algorithm.
展开▼