首页> 外国专利> CONTROLLING ROBOTS USING ENTROPY CONSTRAINTS

CONTROLLING ROBOTS USING ENTROPY CONSTRAINTS

机译：使用熵约束来控制机器人

页面导航

摘要
著录项
相似文献

摘要

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network having policy parameters. One of the methods includes obtaining trajectory data comprising one or more tuples; updating, using the trajectory data, current values of the policy parameters using a maximum entropy reinforcement learning technique that maximizes both (i) a reward term and (ii) an entropy term, wherein a relative weight between the entropy term and the reward term in the maximization is determined by a temperature parameter; and updating, using the probability distributions defined by the policy outputs generated in accordance with the current values of the policy parameters for the tuples in the trajectory data, the temperature parameter to regulate an expected entropy of the probability distributions to at least equal a minimum expected entropy value.

机译：方法，系统和装置，包括编码在计算机存储介质上的计算机程序，用于训练具有策略参数的策略神经网络。该方法之一包括获得包括一个或多个元组的轨迹数据。使用轨迹数据，使用最大熵增强学习技术更新策略参数的当前值，该技术最大程度地最大化（i）奖励项和（ii）熵项，其中，熵项和奖励项之间的相对权重最大值由温度参数确定;使用由根据轨迹数据中元组的策略参数的当前值生成的策略输出定义的概率分布，将温度参数调整为将概率分布的期望熵至少调整为至少等于最小期望值熵值。

著录项

公开/公告号WO2020113228A1

专利类型
公开/公告日2020-06-04

原文格式PDF
申请/专利权人 GOOGLE LLC;
展开▼

申请/专利号WO2019US64047
发明设计人 HAARNOJA TUOMAS;
展开▼

申请日2019-12-02
分类号G06N3/08;
国家 WO
入库时间 2022-08-21 11:10:49

相似文献

专利
外文文献
中文文献