...
首页> 外文期刊>International Journal of Adaptive Control and Signal Processing >Optimized look-ahead tree policies: a bridge between look-ahead tree policies and direct policy search
【24h】

Optimized look-ahead tree policies: a bridge between look-ahead tree policies and direct policy search

机译:优化的前瞻性树策略:前瞻性树策略和直接策略搜索之间的桥梁

获取原文
获取原文并翻译 | 示例
           

摘要

Direct policy search (DPS) and look-ahead tree (LT) policies are two popular techniques for solving difficult sequential decision-making problems. They both are simple to implement, widely applicable without making strong assumptions on the structure of the problem, and capable of producing high-performance control policies. However, computationally, both of them are, each in their own way, very expensive. DPS can require huge offline resources (effort required to obtain the policy) to first select an appropriate space of parameterized policies that works well for the targeted problem and then to determine the best values of the parameters via global optimization. LT policies do not require any offline resources; however, they typically require huge online resources (effort required to calculate the best decision at each step) in order to grow trees of sufficient depth. In this paper, we propose optimized LTs (OLTs), a model-based policy learning scheme that lies at the intersection of DPS and LT. In OLT, the control policy is represented indirectly through an algorithm that at each decision step develops, as in LT by using a model of the dynamics, a small LT until a prespecified online budget is exhausted. Unlike LT, the development of the tree is not driven by a generic heuristic; rather, the heuristic is optimized for the target problem and implemented as a parameterized node scoring function learned offline via DPS. We experimentally compare OLT with pure DPS and pure LT variants on optimal control benchmark domains. The results show that the LT-based representation is a versatile way of compactly representing policies in a DPS scheme (which results in OLT being easier to tune and having lower offline complexity than pure DPS) and at the same time DPS helps to significantly reduce the size of the LTs that are required to take high-quality decisions (which results in OLT having lower online complexity than pure LT). Moreover, OLT produces overall better performing policies than pure DPS and pure LT, and also results in policies that are robust with respect to perturbations of the initial conditions.
机译:直接策略搜索(DPS)和超前树(LT)策略是解决困难的顺序决策问题的两种流行技术。它们都易于实施,适用范围广,无需对问题的结构做出强有力的假设,并且能够产生高性能的控制策略。但是,在计算上,它们两者都以各自的方式非常昂贵。 DPS可能需要大量的脱机资源(获取策略所需的精力),首先选择适合目标问题的合适的参数化策略空间,然后通过全局优化确定参数的最佳值。 LT策略不需要任何离线资源。但是,为了种植足够深度的树木,它们通常需要大量的在线资源(需要在每个步骤上计算最佳决策所需的精力)。在本文中,我们提出了优化的LT(OLT),这是一种基于模型的策略学习方案,位于DPS和LT的交集处。在OLT中,控制策略是通过一种算法间接表示的,该算法在每个决策步骤中都会像在LT中那样通过使用动力学模型发展出一个小的LT,直到耗尽了预先指定的在线预算为止。与LT不同,树的开发不是由通用启发式驱动的。相反,启发式算法针对目标问题进行了优化,并实现为通过DPS离线学习的参数化节点评分功能。我们在最佳控制基准域上通过实验将OLT与纯DPS和纯LT变体进行比较。结果表明,基于LT的表示形式是一种紧凑的方式来紧凑地表示DPS方案中的策略(这导致OLT比纯DPS更易于调整并且具有较低的脱机复杂度),同时DPS有助于显着减少DPS方案中的策略。做出高质量决策所需的LT的大小(这导致OLT的在线复杂度比纯LT低)。而且,与纯DPS和纯LT相比,OLT产生了总体上性能更好的策略,并且还导致了对于初始条件的扰动而言鲁棒的策略。

著录项

  • 来源
  • 作者单位

    Montefiore Institute, Department of Electrical Engineering and Computer Science, University of Liege, Liege, Belgium;

    Montefiore Institute, Department of Electrical Engineering and Computer Science, University of Liege, Liege, Belgium;

    Montefiore Institute, Department of Electrical Engineering and Computer Science, University of Liege, Liege, Belgium;

    Montefiore Institute, Department of Electrical Engineering and Computer Science, University of Liege, Liege, Belgium,Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);美国《生物学医学文摘》(MEDLINE);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    reinforcement learning; optimal control; direct policy search; look-ahead tree search;

    机译:强化学习;最佳控制;直接政策搜索;前瞻树搜索;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号