首页> 外国专利> Reinforcement learning methods, reinforcement learning devices and reinforcement learning programs for efficient learning

Reinforcement learning methods, reinforcement learning devices and reinforcement learning programs for efficient learning

机译:强化学习方法,强化学习设备和强化学习程序,可实现高效学习

摘要

PROBLEM TO BE SOLVED: To provide a reinforcement learning method for expression learning specialized in a state in which a attention state such as an environment reset occurs so that learning can be performed efficiently. SOLUTION: In a reinforcement learning method for optimizing an agent's behavioral policy from the result of learning using a learning device that learns based on a state observed from environmental data, it is set in advance during learning of environmental data in one episode. It is determined whether or not the state in which the attention situation has occurred has been observed. Then, when the state in which the attention situation occurs is observed, the feature extractor (first learner) uses two environmental data, the environmental data in the state in which the attention situation occurs and the environmental data at the previous time. To learn and perform expression learning. Then, the difference between the feature data is learned by the state classifier (second learner), and the parameters of the first learner and the second learner are updated based on the output estimation data and the actual data. [Selection diagram] Fig. 3
机译:解决的问题:提供一种用于表情学习的强化学习方法,专门用于在其中发生诸如环境重置之类的关注状态的状态下的学习,从而可以有效地进行学习。解决方案:在一种强化学习方法中,该学习方法是根据使用从环境数据中观察到的状态进行学习的学习设备根据学习结果优化代理人的行为策略的,该方法是在学习环境数据时在一集中预先设置的。确定是否已经观察到注意力状况已经发生的状态。然后,当观察到关注状况发生的状态时,特征提取器(第一学习者)使用两个环境数据,即关注状况发生状态的环境数据和前一时间的环境数据。学习和进行表情学习。然后,通过状态分类器(第二学习器)学习特征数据之间的差异,并且基于输出估计数据和实际数据来更新第一学习器和第二学习器的参数。 [选择图]图3

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号