首页>
外国专利>
A learning method and learning device that uses human driving data as training data to perform customized route planning by supporting reinforcement learning.
A learning method and learning device that uses human driving data as training data to perform customized route planning by supporting reinforcement learning.
展开▼
机译:一种学习方法和学习设备,使用人类驱动数据作为训练数据来通过支持增强学习来执行定制的路线规划。
展开▼
页面导航
摘要
著录项
相似文献
摘要
A learning method for acquiring at least one personalized reward function, used for performing a Reinforcement Learning(RL) algorithm, corresponding to a personalized optimal policy for a subject driver is provided. And the method includes steps of: (a) a learning device performing a process of instructing an adjustment reward network to generate first adjustment rewards, by referring to the information on actual actions and actual circumstance vectors in driving trajectories, a process of instructing a common reward module to generate first common rewards by referring to the actual actions and the actual circumstance vectors, and a process of instructing an estimation network to generate actual prospective values by referring to the actual circumstance vectors; and (b) the learning device instructing a first loss layer to generate an adjustment reward and to perform backpropagation to learn parameters of the adjustment reward network.
展开▼