首页> 外文期刊>Industrial Informatics, IEEE Transactions on >Pheromone-Based Planning Strategies in Dyna-Q Learning
【24h】

Pheromone-Based Planning Strategies in Dyna-Q Learning

机译:Dyna-Q学习中基于信息素的计划策略

获取原文
获取原文并翻译 | 示例
       

摘要

A Dyna-Q algorithm is known as model-based reinforcement learning, so the learning agent not only interacts with the environment to learn an optimal policy, but also builds an environmental model simultaneously. To deal with the shortage of online samples, the environmental model is introduced to achieve the goal. To enhance the efficiency of the model, this paper proposes a model shaping method to compensate for bleak states scarcely visited during neighbor information. After acquiring an accurate model, many virtual experiences are sampled from this shaping model and indirect learning is thereby performed. However, how to use the model to speed up learning is an important issue. To increase the learning speed of the Dyna-Q algorithm based on the prioritized sweeping that can actually be regarded as a breadth-first search method, this paper introduces a depth-first search method that applies the techniques of ant colony algorithms to an exploration factor for selecting candidates in indirect learning. The strategy evolves to a hybrid planning approach by proportionally interleaving executions of depth-first planning and breadth-first planning. To verify the validity and applicability of the proposed method, simulations with a mountain car and maze problem are conducted. The simulation results show that the proposed method can achieve the objectives of sample efficiency and learning acceleration for the Dyna-Q learning algorithm.
机译:Dyna-Q算法被称为基于模型的强化学习,因此学习代理不仅与环境交互以学习最佳策略,而且同时构建环境模型。为了解决在线样本不足的问题,引入了环境模型以实现目标。为了提高模型的效率,本文提出了一种模型整形方法,以补偿邻居信息期间很少访问的暗淡状态。在获取准确的模型之后,从该成形模型中采样了许多虚拟体验,从而执行了间接学习。但是,如何使用模型来加速学习是一个重要的问题。为了提高基于优先扫描的Dyna-Q算法的学习速度,该算法实际上可以被视为广度优先搜索方法,本文介绍了一种将蚁群算法技术应用于探索因子的深度优先搜索方法选择间接学习的候选人。该策略通过按比例交错执行深度优先计划和宽度优先计划来发展为混合计划方法。为了验证该方法的有效性和适用性,对山地车和迷宫问题进行了仿真。仿真结果表明,该方法可以达到Dyna-Q学习算法的采样效率和学习加速的目的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号