Inverse reinforcement learning using Dynamic Policy Programming

机译：使用动态策略编程进行逆向强化学习

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a novel model-free inverse reinforcement learning method based on density ratio estimation under the framework of Dynamic Policy Programming. We show that the logarithm of the ratio between the optimal policy and the baseline policy is represented by the state-dependent cost and the value function. Our proposal is to use density ratio estimation methods to estimate the density ratio of policies and the least squares method with regularization to estimate the state-dependent cost and the value function that satisfies the relation. Our method can avoid computing the integral such as evaluating the partition function. A simple numerical simulation of a grid world navigation, a car driving, and a pendulum swing-up shows its superiority over conventional methods.

机译：在动态策略规划的框架下，提出了一种基于密度比估计的无模型逆强化学习新方法。我们表明，最优策略和基准策略之间的比率的对数由状态相关成本和价值函数表示。我们的建议是使用密度比率估计方法来估计策略的密度比率，并使用最小二乘法进行正则化来估计状态相关成本和满足该关系的价值函数。我们的方法可以避免计算积分，例如评估分区函数。网格世界导航，汽车驾驶和摆摆的简单数值模拟显示了其优于常规方法的优越性。

著录项

来源
《The Fourth joint IEEE international conferences on development and learning and epigenetic robotics》|2014年|222-228|共7页
会议地点 Genoa(IT)
作者
Uchibe Eiji; Doya Kenji;
展开▼
作者单位

Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, 904-0495, Japan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Cost function; Estimation; Learning (artificial intelligence); Mathematical model; Navigation; Trajectory; Vectors;

机译：成本函数;估计;学习（人工智能）;数学模型;导航;轨迹;向量;;

相似文献

外文文献
中文文献
专利

1. Kernel dynamic policy programming: Applicable reinforcement learning to robot systems with high dimensional states [J] . Yunduan Cui, Takamitsu Matsubara, Kenji Sugimoto Neural Networks: The Official Journal of the International Neural Network Society . 2017,第期

机译：内核动态策略编程：适用于具有高维状态的机器人系统的适用加固
2. Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement [J] . Ali Ezzeddine, Nafee Mourad, Babak Nadjar Araabi, Expert Systems with Application . 2018,第DECa期

机译：通过逆向强化学习和贝叶斯政策改进，结合非最佳演示和反馈中的学习
3. Interpretable policies for reinforcement learning by genetic programming [J] . Daniel Hein, Steffen Udluft, Thomas A. Runkler Engineering Applications of Artificial Intelligence . 2018,第NOVa期

机译：通过基因编程进行强化学习的可解释政策
4. Kernel dynamic policy programming: Practical reinforcement learning for high-dimensional robots [C] . Yunduan Cui, Takamitsu Matsubara, Kenji Sugimoto IEEE-RAS International Conference on Humanoid Robots . 2016

机译：内核动态策略编程：高维机器人的实用强化学习
5. Min-Max Inverse Reinforcement Learning for Learning Bi-Modal Dialogue Policies [D] . Patil, Gandharv. 2020

机译：用于学习双模对话策略的最大最大逆钢筋学习
6. Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning [O] . Tulika Saha, Sriparna Saha, Pushpak Bhattacharyya 2020

机译：利用等级强化学习的多意图对话的情感对话策略学习
7. UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning [O] . Weiren Kong, Deyun Zhou, Zhen Yang, 2020

机译：无人机自动空中作战机动策略生成基于国家对冲深度确定性政策梯度和反增强学习的观察误差

Inverse reinforcement learning using Dynamic Policy Programming

摘要

著录项

相似文献

相关主题

期刊订阅