首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Parallel Online Temporal Difference Learning for Motor Control
【24h】

Parallel Online Temporal Difference Learning for Motor Control

机译:电机控制的并行在线时间差异学习

获取原文
获取原文并翻译 | 示例
           

摘要

Temporal difference (TD) learning, a key concept in reinforcement learning, is a popular method for solving simulated control problems. However, in real systems, this method is often avoided in favor of policy search methods because of its long learning time. But policy search suffers from its own drawbacks, such as the necessity of informed policy parameterization and initialization. In this paper, we show that TD learning can work effectively in real robotic systems as well, using parallel model learning and planning. Using locally weighted linear regression and trajectory sampled planning with 14 concurrent threads, we can achieve a speedup of almost two orders of magnitude over regular TD control on simulated control benchmarks. For a real-world pendulum swing-up task and a two-link manipulator movement task, we report a speedup of 20× to 60×, with a real-time learning speed of less than half a minute. The results are competitive with state-of-the-art policy search.
机译:时差(TD)学习是强化学习中的关键概念,是解决模拟控制问题的一种流行方法。但是,在实际系统中,由于学习时间长,通常避免使用此方法,而推荐使用策略搜索方法。但是策略搜索有其自身的缺点,例如必须进行明智的策略参数化和初始化。在本文中,我们表明,使用并行模型学习和计划,TD学习也可以在真实的机器人系统中有效地工作。使用局部加权线性回归和具有14个并发线程的轨迹采样计划,我们可以在模拟控制基准上比常规TD控制实现近两个数量级的加速。对于现实世界中的摆摆任务和两连杆机械手运动任务,我们报告了20倍至60倍的加速,实时学习速度不到半分钟。结果与最新的策略搜索相比具有竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号