Temporal-Difference Search in Computer Go

机译：Go语言中的时差搜索

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Temporal-difference (TD) learning is one of the most successful and broadly applied solutions to the reinforcement learning problem; it has been used to achieve master-level play in chess, checkers and backgammon. Monte-Carlo tree search is a recent algorithm for simulation-based search, which has been used to achieve master-level play in Go. We have introduced a new approach to high-performance planning (Silver, Sutton, and Miiller 2012). Our method, TD search, combines TD learning with simulation-based search. Like Monte-Carlo tree search, value estimates are updated by learning online from simulated experience. Like TD learning, it uses value function approximation and bootstrapping to efficiently generalise between related states. We applied TD search to the game of 9 x 9 Go, using a million binary features matching simple patterns of stones. Without any explicit search tree, our approach outperformed a vanilla Monte-Carlo tree search with the same number of simulations. When combined with a simple alpha-beta search, our program also outperformed all traditional (pre-Monte-Carlo) search and machine learning programs on the 9 × 9 Computer Go Server.

机译：时差（TD）学习是强化学习问题最成功且应用最广泛的解决方案之一；它已被用来实现国际象棋，跳棋和西洋双陆棋的大师级比赛。蒙特卡洛树搜索是一种用于基于模拟的搜索的最新算法，已被用于在Go中实现大师级的比赛。我们为绩效规划引入了一种新方法（Silver，Sutton和Miiller 2012）。我们的方法TD搜索将TD学习与基于模拟的搜索相结合。像蒙特卡洛树搜索一样，通过从模拟经验在线学习来更新价值估计。像TD学习一样，它使用值函数逼近和自举来有效地概括相关状态之间的关系。我们将TD搜索应用于9 x 9 Go的游戏中，使用了一百万个与简单石头图案匹配的二进制特征。在没有任何显式搜索树的情况下，我们的方法在模拟次数相同的情况下优于传统的蒙特卡洛树搜索。与简单的alpha-beta搜索结合使用时，我们的程序还优于9×9 Computer Go Server上的所有传统（蒙特卡洛之前）搜索和机器学习程序。

著录项

来源
《Proceedings of the Twenty-Third International conference on Automated planning and Scheduling》|2013年|486-487|共2页
会议地点 Rome(IT)
作者
David Silver; Richard Sutton; Martin Mueller;
展开▼
作者单位

Department of Computer Science University College London Gower Street, London WC1E 6BT;

Department of Computing Science University of Alberta Edmonton, Alberta T6G 2E8;

Department of Computing Science University of Alberta Edmonton, Alberta T6G 2E8;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Temporal-difference search in computer Go [J] . David Silver, Richard S. Sutton, Martin Miiller Machine Learning . 2012,第2期

机译：Go语言中的时差搜索
2. Search engines, cognitive biases and the man-computer interaction: a theoretical framework for empirical researches about cognitive biases in online search on health-related topics [J] . Medicine, health care, and philosophy . 2020,第2期

机译：搜索引擎，认知偏见和人计算机互动：对健康相关主题在线搜索认知偏差的实证研究理论框架
3. Computer-assisted search of similar parts for the calculation of offers for castings - Part 2. Assisting ssytem "search of similar parts" [J] . Scheler R., Hofmann I., Krotzsch S., Giesserei: Die Zeitschrift der Deutschen Giessereivereinigungen: Die Zeitschrift der Deutschen Giessereivereinigungen . 2000,第8期

机译：计算机辅助搜索相似零件以计算铸件报价-第2部分。协助系统“搜索相似零件”
4. Temporal-Difference Search in Computer Go [C] . David Silver, Richard Sutton, Martin Mueller International Conference on Automated Planning and Scheduling . 2013

机译：计算机上的时间区别检验
5. Temporal-difference networks. [D] . Tanner, Brian Timothy. 2005

机译：时差网络。
6. Striatal and Tegmental Neurons Code Critical Signals for Temporal-Difference Learning of State Value in Domestic Chicks [O] . Chentao Wen, Yukiko Ogura, Toshiya Matsushima 2016

机译：纹状体和背盖神经元代码关键信号的家禽的状态值的时差学习。
7. Temporal-difference search in computer Go [O] . David Silver, Richard Sutton, Martin Müller 2013

机译：计算机Go中的时差搜索

Temporal-Difference Search in Computer Go

摘要

著录项

相似文献

相关主题

期刊订阅