首页> 外文会议>International Conference on Machine Learning >Low-Variance and Zero-Variance Baselines for Extensive-Form Games
【24h】

Low-Variance and Zero-Variance Baselines for Extensive-Form Games

机译:用于广泛形式游戏的低方差和零方差基线

获取原文

摘要

Extensive-form games (EFGs) are a common model of multi-agent interactions with imperfect information. State-of-the-art algorithms for solving these games typically perform full walks of the game tree that can prove prohibitively slow in large games. Alternatively, sampling-based methods such as Monte Carlo Counterfactual Regret Minimization walk one or more trajectories through the tree, touching only a fraction of the nodes on each iteration, at the expense of requiring more iterations to converge due to the variance of sampled values. In this paper, we extend recent work that uses baseline estimates to reduce this variance. We introduce a framework of baseline-corrected values in EFGs that generalizes the previous work. Within our framework, we propose new baseline functions that result in significantly reduced variance compared to existing techniques. We show that one particular choice of such a function - predictive baseline - is provably optimal under certain sampling schemes. This allows for efficient computation of zero-variance value estimates even along sampled trajectories.
机译:广泛形式的游戏(EFGS)是与不完美信息的多代理交互的共同模型。用于解决这些游戏的最先进的算法通常会对游戏树进行全面散步,这些游戏树可以证明在大型游戏中可以证明是速度的。或者,基于采样的方法,如蒙特卡罗反事实遗员最小化,通过树步行一个或多个轨迹,仅在每次迭代中触摸节点的一小部分,以牺牲由于采样值的方差而需要更多的迭代来收敛。在本文中,我们延长了最近的工作,使用基线估计来减少这种方差。我们在概括上一个工作的EFG中介绍了基线纠正的值的框架。在我们的框架内,我们提出了与现有技术相比显着降低的方差的新基线函数。我们展示了在某些采样方案下可透明地优化这种功能 - 预测基线的一个特殊选择。这允许即使沿着采样的轨迹,也允许高效计算零方差值估计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号