This paper proposes statistic learning based Qlearning algorithm for Multi-Agent System, the agent can learn other agents' action policies through observing and counting the joint action, a concise but useful hypothesis is adopted to denote the optimal policies of other agents, the full joint probability of policies distribution guarantees the learning agent to choose optimal action. The algorithm can improve the learning speed because it cut conventional Qlearning space from exponential one to linear one. The convergence of the algorithm is proved, the successful application of this algorithm in the RoboCup shows its good learning performance.
展开▼