首页>
外国专利>
CONTROLLING ROBOTS USING ENTROPY CONSTRAINTS
CONTROLLING ROBOTS USING ENTROPY CONSTRAINTS
展开▼
机译:使用熵约束来控制机器人
展开▼
页面导航
摘要
著录项
相似文献
摘要
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a policy neural network having policy parameters. One of the methods includes obtaining trajectory data comprising one or more tuples; updating, using the trajectory data, current values of the policy parameters using a maximum entropy reinforcement learning technique that maximizes both (i) a reward term and (ii) an entropy term, wherein a relative weight between the entropy term and the reward term in the maximization is determined by a temperature parameter; and updating, using the probability distributions defined by the policy outputs generated in accordance with the current values of the policy parameters for the tuples in the trajectory data, the temperature parameter to regulate an expected entropy of the probability distributions to at least equal a minimum expected entropy value.
展开▼