首页> 外国专利> Training action selection neural networks using off-policy actor critic reinforcement learning

Training action selection neural networks using off-policy actor critic reinforcement learning

机译：使用非政策演员批评家强化学习来训练动作选择神经网络

页面导航

摘要
著录项
相似文献

摘要

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.

机译：用于训练动作选择神经网络的方法，系统和装置，包括编码在计算机存储介质上的计算机程序。该方法之一包括维持重放存储器，该重放存储器存储由于代理与环境的交互而产生的轨迹。训练在重放存储器中的轨迹上具有策略参数的动作选择神经网络，其中，训练动作选择神经网络包括：从重放存储器中采样轨迹;通过使用偏离策略的演员批评家强化学习技术在轨迹上训练动作选择神经网络来调整策略参数的当前值。

著录项

公开/公告号US10706352B2

专利类型
公开/公告日2020-07-07

原文格式PDF
申请/专利权人 DEEPMIND TECHNOLOGIES LIMITED;
展开▼

申请/专利号US201916402687
发明设计人 ZIYU WANG;NICOLAS MANFRED OTTO HEESS;VICTOR CONSTANT BAPST;
展开▼

申请日2019-05-03
分类号G06N3/04;G06N3/08;G06N3;
国家 US
入库时间 2022-08-21 11:29:25

相似文献

专利
外文文献
中文文献