首页> 外国专利> A learning method and learning device that uses human driving data as training data to perform customized route planning by supporting reinforcement learning.

A learning method and learning device that uses human driving data as training data to perform customized route planning by supporting reinforcement learning.

机译:一种学习方法和学习设备,使用人类驱动数据作为训练数据来通过支持增强学习来执行定制的路线规划。

摘要

A learning method for acquiring at least one personalized reward function, used for performing a Reinforcement Learning(RL) algorithm, corresponding to a personalized optimal policy for a subject driver is provided. And the method includes steps of: (a) a learning device performing a process of instructing an adjustment reward network to generate first adjustment rewards, by referring to the information on actual actions and actual circumstance vectors in driving trajectories, a process of instructing a common reward module to generate first common rewards by referring to the actual actions and the actual circumstance vectors, and a process of instructing an estimation network to generate actual prospective values by referring to the actual circumstance vectors; and (b) the learning device instructing a first loss layer to generate an adjustment reward and to perform backpropagation to learn parameters of the adjustment reward network.
机译:提供用于进行用于执行加强学习(RL)算法的至少一个个性化奖励功能的学习方法,其对应于对应于对象驱动器的个性化最佳策略。 该方法包括以下步骤:(a)通过参考关于驱动轨迹中的实际动作和实际情况向量的信息,执行指示调整奖励网络以产生第一调整奖励的过程的学习设备,这是指导常见的过程 奖励模块通过参考实际操作和实际情况向量来生成第一常见奖励,以及指示估计网络通过参考实际情况向量来生成实际前瞻值的过程; (b)学习设备指示第一丢失层生成调整奖励并执行BackPropagation以学习调整奖励网络的参数。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号