首页>
外国专利>
Training action selection neural networks using off-policy actor critic reinforcement learning
Training action selection neural networks using off-policy actor critic reinforcement learning
展开▼
机译:使用非政策演员批评家强化学习来训练动作选择神经网络
展开▼
页面导航
摘要
著录项
相似文献
摘要
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.
展开▼