首页> 外国专利> REINFORCEMENT LEARNING IN COMBINATORIAL ACTION SPACES

REINFORCEMENT LEARNING IN COMBINATORIAL ACTION SPACES

机译:组合动作空间中的加固学习

摘要

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning in combinatorial action spaces. One of the methods includes receiving an observation characterizing a current state of an environment; for each of a plurality of candidate actions: processing a network input using a Q neural network to generate a Q value that represents a return received if the candidate action is selected while the candidate action is presented in response to the received observation, processing the network input using a myopic neural network to generate a myopic output that represents a likelihood that the candidate action will be selected if the candidate action is presented in response to the received observation, and combining the myopic output and the Q value for the candidate action to generate a selection score for the candidate action; and selecting the candidate actions having the highest selection scores.
机译:方法,系统和设备,包括在计算机存储介质上编码的计算机程序,用于在组合动作空间中的增强学习。其中一个方法包括接收表征环境的当前状态的观察;对于多个候选动作中的每一个:使用Q神经网络处理网络输入以生成表示在响应于接收到的观察的候选操作的同时选择候选动作时选择返回的Q值,从而处理网络使用近视神经网络生成近视输出的输入,该近视输出表示何时响应于接收的观察呈现候选动作,以及组合近视输出和Q值以生成域名输出和Q值来选择候选动作的可能性候选人行动的选择分数;并选择具有最高选择分数的候选操作。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号