首页> 外国专利> A learning method and learning device that uses human driving data as training data to perform customized route planning by supporting reinforcement learning.

A learning method and learning device that uses human driving data as training data to perform customized route planning by supporting reinforcement learning.

机译：一种学习方法和学习设备，使用人类驱动数据作为训练数据来通过支持增强学习来执行定制的路线规划。

页面导航

摘要
著录项
相似文献

摘要

A learning method for acquiring at least one personalized reward function, used for performing a Reinforcement Learning(RL) algorithm, corresponding to a personalized optimal policy for a subject driver is provided. And the method includes steps of: (a) a learning device performing a process of instructing an adjustment reward network to generate first adjustment rewards, by referring to the information on actual actions and actual circumstance vectors in driving trajectories, a process of instructing a common reward module to generate first common rewards by referring to the actual actions and the actual circumstance vectors, and a process of instructing an estimation network to generate actual prospective values by referring to the actual circumstance vectors; and (b) the learning device instructing a first loss layer to generate an adjustment reward and to perform backpropagation to learn parameters of the adjustment reward network.

机译：提供用于进行用于执行加强学习（RL）算法的至少一个个性化奖励功能的学习方法，其对应于对应于对象驱动器的个性化最佳策略。该方法包括以下步骤：（a）通过参考关于驱动轨迹中的实际动作和实际情况向量的信息，执行指示调整奖励网络以产生第一调整奖励的过程的学习设备，这是指导常见的过程奖励模块通过参考实际操作和实际情况向量来生成第一常见奖励，以及指示估计网络通过参考实际情况向量来生成实际前瞻值的过程; （b）学习设备指示第一丢失层生成调整奖励并执行BackPropagation以学习调整奖励网络的参数。

著录项

公开/公告号JP6931937B2

专利类型
公开/公告日2021-09-08

原文格式PDF
申请/专利权人株式会社ストラドビジョン;
展开▼

申请/专利号JP20200011163
发明设计人金桂賢;金鎔重;金鶴京;南雲鉉;夫碩▲ふん▼;成明哲;申東洙;呂東勳;柳宇宙;李明春;李炯樹;張泰雄;鄭景中;諸泓模;趙浩辰;
展开▼

申请日2020-01-27
分类号G08G1/16;G08G1;G01C21/34;G06N20;
国家 JP
入库时间 2022-08-24 20:53:56

相似文献

专利
外文文献
中文文献