Optimized look-ahead tree policies: a bridge between look-ahead tree policies and direct policy search

Tobias Jung; Louis Wehenkel; Damien Ernst; Francis Maes

首页> 外文期刊>International Journal of Adaptive Control and Signal Processing >Optimized look-ahead tree policies: a bridge between look-ahead tree policies and direct policy search

【24h】

Optimized look-ahead tree policies: a bridge between look-ahead tree policies and direct policy search

机译：优化的前瞻性树策略：前瞻性树策略和直接策略搜索之间的桥梁

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Direct policy search (DPS) and look-ahead tree (LT) policies are two popular techniques for solving difficult sequential decision-making problems. They both are simple to implement, widely applicable without making strong assumptions on the structure of the problem, and capable of producing high-performance control policies. However, computationally, both of them are, each in their own way, very expensive. DPS can require huge offline resources (effort required to obtain the policy) to first select an appropriate space of parameterized policies that works well for the targeted problem and then to determine the best values of the parameters via global optimization. LT policies do not require any offline resources; however, they typically require huge online resources (effort required to calculate the best decision at each step) in order to grow trees of sufficient depth. In this paper, we propose optimized LTs (OLTs), a model-based policy learning scheme that lies at the intersection of DPS and LT. In OLT, the control policy is represented indirectly through an algorithm that at each decision step develops, as in LT by using a model of the dynamics, a small LT until a prespecified online budget is exhausted. Unlike LT, the development of the tree is not driven by a generic heuristic; rather, the heuristic is optimized for the target problem and implemented as a parameterized node scoring function learned offline via DPS. We experimentally compare OLT with pure DPS and pure LT variants on optimal control benchmark domains. The results show that the LT-based representation is a versatile way of compactly representing policies in a DPS scheme (which results in OLT being easier to tune and having lower offline complexity than pure DPS) and at the same time DPS helps to significantly reduce the size of the LTs that are required to take high-quality decisions (which results in OLT having lower online complexity than pure LT). Moreover, OLT produces overall better performing policies than pure DPS and pure LT, and also results in policies that are robust with respect to perturbations of the initial conditions.

机译：直接策略搜索（DPS）和超前树（LT）策略是解决困难的顺序决策问题的两种流行技术。它们都易于实施，适用范围广，无需对问题的结构做出强有力的假设，并且能够产生高性能的控制策略。但是，在计算上，它们两者都以各自的方式非常昂贵。 DPS可能需要大量的脱机资源（获取策略所需的精力），首先选择适合目标问题的合适的参数化策略空间，然后通过全局优化确定参数的最佳值。 LT策略不需要任何离线资源。但是，为了种植足够深度的树木，它们通常需要大量的在线资源（需要在每个步骤上计算最佳决策所需的精力）。在本文中，我们提出了优化的LT（OLT），这是一种基于模型的策略学习方案，位于DPS和LT的交集处。在OLT中，控制策略是通过一种算法间接表示的，该算法在每个决策步骤中都会像在LT中那样通过使用动力学模型发展出一个小的LT，直到耗尽了预先指定的在线预算为止。与LT不同，树的开发不是由通用启发式驱动的。相反，启发式算法针对目标问题进行了优化，并实现为通过DPS离线学习的参数化节点评分功能。我们在最佳控制基准域上通过实验将OLT与纯DPS和纯LT变体进行比较。结果表明，基于LT的表示形式是一种紧凑的方式来紧凑地表示DPS方案中的策略（这导致OLT比纯DPS更易于调整并且具有较低的脱机复杂度），同时DPS有助于显着减少DPS方案中的策略。做出高质量决策所需的LT的大小（这导致OLT的在线复杂度比纯LT低）。而且，与纯DPS和纯LT相比，OLT产生了总体上性能更好的策略，并且还导致了对于初始条件的扰动而言鲁棒的策略。

著录项

来源
《International Journal of Adaptive Control and Signal Processing》 |2014年第5期|255-289|共35页
作者
Tobias Jung; Louis Wehenkel; Damien Ernst; Francis Maes;
展开▼
作者单位

Montefiore Institute, Department of Electrical Engineering and Computer Science, University of Liege, Liege, Belgium;

Montefiore Institute, Department of Electrical Engineering and Computer Science, University of Liege, Liege, Belgium;

Montefiore Institute, Department of Electrical Engineering and Computer Science, University of Liege, Liege, Belgium;

Montefiore Institute, Department of Electrical Engineering and Computer Science, University of Liege, Liege, Belgium,Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);美国《生物学医学文摘》(MEDLINE);
原文格式 PDF
正文语种 eng
中图分类
关键词
reinforcement learning; optimal control; direct policy search; look-ahead tree search;

机译：强化学习;最佳控制;直接政策搜索;前瞻树搜索;

相似文献

外文文献
中文文献
专利

1. Improving local-search metaheuristics through look-ahead policies [J] . Meignan David, Schwarze Silvia, Voss Stefan Annals of Mathematics and Artificial Intelligence . 2016,第1a2期

机译：通过前瞻性策略改善本地搜索元启发法
2. Monte Carlo Tree Search with Last-Good-Reply Policy for Cognitive Optimization of Cloud-Ready Optical Networks [J] . Michal Aibin, Krzysztof Walkowiak Journal of network and systems management . 2020,第4期

机译：Monte Carlo树搜索使用云就绪光网络认知优化的最后一个好回复策略
3. Linear Nearest Neighbor Realization of Quantum Circuits Using Clustering and Look-ahead Policy [J] . Bhattacharjee Anirban, Bandyopadhyay Chandan, Mondal Bappaditya, Journal of circuits, systems and computers . 2020,第16期

机译：线性最近邻使用聚类和前瞻性政策的量子电路实现
4. Optimized Look-ahead Tree Search Policies [C] . Francis Maes, Louis Wehenkel, Damien Ernst European Workshop on Reinforcement Learning . 2012

机译：优化的展望树搜索策略
5. Multiagent Monte Carlo Tree Search with Difference Evaluations and Evolved Rollout Policy [D] . Zerbel, Nicholas. 2018

机译：具有差异评估和演进式推广策略的多主体蒙特卡洛树搜索
6. COST Action FP0905: Biosafety of forest transgenic trees: improving the scientific basis for safe tree development and implementation of EU policy directives [O] . Crstina Vettori, Matthias Fladung 2011

机译：COST行动FP0905：森林转基因树木的生物安全性：改善安全树木开发和执行欧盟政策指令的科学基础
7. Optimized look-ahead tree policies: a bridge between look-ahead tree policies and direct policy search [O] . Jung, Tobias, Wehenkel, Louis, Ernst, Damien, 2014

机译：优化的前瞻性树策略：前瞻性树策略和直接策略搜索之间的桥梁

Optimized look-ahead tree policies: a bridge between look-ahead tree policies and direct policy search

摘要

著录项

相似文献

相关主题

期刊订阅