To explore continuous action space in actor/critic architecture -a preliminary study

机译：探索演员/评论仪建筑的连续动作空间 - 初步研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, a popular topic in the study of Reinforcement Learning is how to extend the RL into the continuous state space and action space, so that RL can be applied to more real world problems. The ASE/ACE. which is one of the most famous implementations of RL. shows the possibility to be one solution. However, the research effort should be done to organize both state and action space to reduce the indefinite searching to definite. There exists few RL system exploring the action space by combining the effective action sequences to catch the regularity of the environment and thus to be reusable. In this paper. we proposed a memory-based sequence structure. and correspondingly an adaptive action sequence Critic to the Actor/Critic architecture to organize the action space. Experiments to solve a benchmark double integrator problem and a 2-dirmensional complicated double integrator problem are carried out to show the effectiveness of the new architecture.

机译：最近，在加固学习研究中的一个流行课题是如何将RL扩展到连续的状态空间和动作空间中，以便可以应用于更真实的世界问题。 ASE / ACE。这是RL最着名的实现之一。显示有可能成为一种解决方案。但是，应采取研究努力来组织国家和行动空间，以减少对确定的无限期。通过组合有效动作序列来捕获环境的规律性，因此存在很少的RL系统，从而可重复使用。在本文中。我们提出了一种基于内存的序列结构。并相应地对演员/批评架构组织动作空间的自适应行动序列批评批评。进行了解基准双积分器问题的实验和2潜能复杂的双积分器问题，以显示新架构的有效性。

著录项

来源
《ASME Artificial Neural Networks in Engineering Conference》|1999年||共6页
会议地点
作者
Wenwei Yu; Daisuke Iijima; Hiroshi Yokoi; Yukinori Kakazu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. ACTOR-CRITIC ALGORITHMS WITH ε-GREEDY GAUSSIAN POLICY IN MULTIDIMENSIONAL CONTINUOUS ACTION SPACES [J] . Chunyuan Zhang, Qingxin Zhu, Yigui Ou, International Journal of Innovative Computing Information and Control . 2016,第3期

机译：多维连续作用空间中带有ε-贪婪高斯策略的行为准则算法
2. HVAC Optimal Control with the Multistep-Actor Critic Algorithm in Large Action Spaces [J] . Zetian Huang, Jianping Chen, Qiming Fu, Mathematical Problems in Engineering: Theory, Methods and Applications . 2020,第1期

机译：HVAC与大型动作空间中的多级actor批评算法的最佳控制
3. Policy Derivation Methods for Critic-Only Reinforcement Learning in Continuous Action Spaces [J] . Eduard Alibekov, Jiri Kubalik, Robert Babuska IFAC PapersOnLine . 2016,第5期

机译：连续动作空间中仅用于批判性强化学习的策略推导方法
4. To explore continuous action space in actor/critic architecture -a preliminary study [C] . Wenwei Yu, Daisuke Iijima, Hiroshi Yokoi, ASME Artificial Neural Networks in Engineering Conference . 1999

机译：探索演员/评论仪建筑的连续动作空间 - 初步研究
5. THE EFFECT OF BIOFEEDBACK AND RELAXATION TRAINING ON STUDENTS STUDYING VOICE IN AN ACTOR TRAINING PROGRAM: A PRELIMINARY STUDY. [D] . LOFT, MARGARET. 1987

机译：生物反馈和放松训练对在角色训练课程中学习语音的学生的影响：一项初步研究。
6. Automated Continuous Distraction Osteogenesis May Allow Faster Distraction Rates: A Preliminary Study [O] . Zachary S. Peacock, Brad Tricomi, Brian Murphy, -1

机译：自动持续的牵引骨质发生可能允许更快的分心率：初步研究
7. HVAC Optimal Control with the Multistep-Actor Critic Algorithm in Large Action Spaces [O] . Zetian Huang, Jianping Chen, Qiming Fu, 2020

机译：HVAC与大型动作空间中的多级actor批评算法的最佳控制

To explore continuous action space in actor/critic architecture -a preliminary study

摘要

著录项

相似文献

相关主题

期刊订阅