首页> 外国专利> Reinforcement learning using superiority estimation

Reinforcement learning using superiority estimation

机译：使用优势估计进行强化学习

页面导航

摘要
著录项
相似文献

摘要

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for computing Q values for actions to be performed by an agent interacting with an environment from a continuous action space of actions. In one aspect, a system includes a value subnetwork configured to receive an observation characterizing a current state of the environment and process the observation to generate a value estimate; a policy subnetwork configured to receive the observation and process the observation to generate an ideal point in the continuous action space; and a subsystem configured to receive a particular point in the continuous action space representing a particular action; generate an advantage estimate for the particular action; and generate a Q value for the particular action that is an estimate of an expected return resulting from the agent performing the particular action when the environment is in the current state.

机译：方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于从动作的连续动作空间计算将由与环境交互的代理执行的动作的Q值。在一个方面，一种系统包括：价值子网，其被配置为接收表征环境的当前状态的观测值并处理该观测值以生成值估计;以及策略子网，其配置为接收观察并处理观察以在连续动作空间中生成理想点;子系统被配置为在连续动作空间中接收表示特定动作的特定点;为特定的行动产生一个利益估计;并为该特定操作生成一个Q值，该值是当环境处于当前状态时代理执行该特定操作所产生的预期回报的估计值。

著录项

公开/公告号JP6669897B2

专利类型
公开/公告日2020-03-18

原文格式PDF
申请/专利权人グーグルエルエルシー;
展开▼

申请/专利号JP20180560745
发明设计人シシアン・グ;ティモシー・ポール・リリクラップ;イリヤ・ストスケヴァー;セルゲイ・ヴラディミール・リーヴァイン;
展开▼

申请日2017-02-09
分类号G06N3/08;G06N20;G06N3/04;
国家 JP
入库时间 2022-08-21 11:33:39

相似文献

专利
外文文献
中文文献