首页> 外国专利> REINFORCEMENT LEARNING IN COMBINATORIAL ACTION SPACES

REINFORCEMENT LEARNING IN COMBINATORIAL ACTION SPACES

机译：组合动作空间中的加固学习

页面导航

摘要
著录项
相似文献

摘要

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning in combinatorial action spaces. One of the methods includes receiving an observation characterizing a current state of an environment; for each of a plurality of candidate actions: processing a network input using a Q neural network to generate a Q value that represents a return received if the candidate action is selected while the candidate action is presented in response to the received observation, processing the network input using a myopic neural network to generate a myopic output that represents a likelihood that the candidate action will be selected if the candidate action is presented in response to the received observation, and combining the myopic output and the Q value for the candidate action to generate a selection score for the candidate action; and selecting the candidate actions having the highest selection scores.

机译：方法，系统和设备，包括在计算机存储介质上编码的计算机程序，用于在组合动作空间中的增强学习。其中一个方法包括接收表征环境的当前状态的观察;对于多个候选动作中的每一个：使用Q神经网络处理网络输入以生成表示在响应于接收到的观察的候选操作的同时选择候选动作时选择返回的Q值，从而处理网络使用近视神经网络生成近视输出的输入，该近视输出表示何时响应于接收的观察呈现候选动作，以及组合近视输出和Q值以生成域名输出和Q值来选择候选动作的可能性候选人行动的选择分数;并选择具有最高选择分数的候选操作。

著录项

公开/公告号US2021081753A1

专利类型
公开/公告日2021-03-18

原文格式PDF
申请/专利权人 GOOGLE LLC;
展开▼

申请/专利号US201916975060
发明设计人 TZE WAY EUGENE IE;VIHAN JAIN;JING WANG;RITESH AGARWAL;CRAIG EDGAR BOUTILIER;
展开▼

申请日2019-05-20
分类号G06N3;G06N3/04;G06N3/08;G06N3/063;G06K9/62;
国家 US
入库时间 2022-08-24 17:46:32

相似文献

专利
外文文献
中文文献