Credit Assignment Method for Learning Effective Stochastic Policies in Uncertain Domains

机译：学习不确定领域有效随机策略的学分分配方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we introduce FirstVisit Profit-sharing (FVPS) as a credit assignment procedure, an important issue in classifier systems and reinforcement learning frameworks. FVPS reinforces effective rules to make an agent acquire stochastic policies that cause it to behave very robustly within uncertain domains, without pre-defined knowledge or subgoals. We use an internal episodic memory, not only to identify perceptual aliasing states but also to discard looping behavior and to acquire effective stochastic policies to escape perceptual deceptive states. We demonstrate the effectiveness of our method in some typical classes of Partially Observable Markov Decision Processes, comparing with Sarsa(A) using a replacing eligibility trace. We claim that this approach results in an effective stochastic or deterministic policy which is appropriate for the environment.

机译：在本文中，我们介绍了FirstVisit利润分享（FVPS）作为学分分配程序，这是分类器系统和强化学习框架中的重要问题。 FVPS加强了有效的规则，以使代理获得随机策略，从而使其在不确定的域内表现得非常强大，而无需预先定义的知识或子目标。我们使用内部情节记忆，不仅可以识别感知混叠状态，还可以丢弃循环行为，并获得有效的随机策略来逃避感知欺骗状态。我们在部分可观察的马尔可夫决策过程的一些典型类别中证明了我们方法的有效性，并使用替代资格跟踪与Sarsa（A）进行了比较。我们声称，这种方法会产生适用于环境的有效随机或确定性策略。

著录项

来源
《Genetic and Evolutionary Computation Conference: A Joint Meeting of the Sixth Annual Genetic Programming Conference (GP-2001) and the Tenth International Conference on Genetic Algorithms (ICGA-2001), Jul 7-11, 2001, San Francisco, California》|2001年|p.815-822|共8页
会议地点 San Francisco CA(US);San Francisco CA(US);San Francisco CA(US);San Francisco CA(US);San Francisco CA(US);San Francisco CA(US);San Francisco CA(US)
作者
Sachiyo Arai; Katia Sycara;
展开▼
作者单位

The Robotics Institute, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213 USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类生理学;
关键词

相似文献

外文文献
中文文献
专利

1. Effective Methods for Reinforcement Learning in Large Multi-Agent Domains [J] . Martin Riedmiller, Daniel Withopf Information Technology . 2005,第5期

机译：大型多Agent领域中强化学习的有效方法
2. Training a robust reinforcement learning controller for the uncertain system based on policy gradient method [J] . Li Zhan, Xue Shengri, Lin Weiyang, Neurocomputing . 2018,第NOVa17期

机译：基于策略梯度法的不确定系统鲁棒强化学习控制器训练
3. PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning [J] . Li Shilei, Li Meng, Su Jiongming, ACM transactions on intelligent systems and technology . 2021,第3期

机译：PP-PG：将参数扰动与政策梯度方法相结合，为深加固学习中有效和高效的探索
4. Credit Assignment Method for Learning Effective Stochastic Policies in Uncertain Domains [C] . Sachiyo Arai, Katia Sycara Genetic and evolutionary computation conference . 2001

机译：在不确定域中学习有效随机政策的信用分配方法
5. Stochastic Explanations: Learning From Mistakes In Stochastic Domains. [D] . Finestrali, Giulio. 2013

机译：随机说明：从随机域中的错误中学习。
6. Desirability availability credit assignment category learning and attention: Cognitive-emotional and working memory dynamics of orbitofrontal ventrolateral and dorsolateral prefrontal cortices [O] . Stephen Grossberg 2018

机译：可取性可用性学分分配类别学习和注意：眶额腹侧和背外侧前额皮层的认知情感和工作记忆动力学
7. A Reinforcement Learning Method with the Inference of the Other Agent's Policy for 2-Player Stochastic Games [O] . 長行康男, 伊藤実 2003

机译：一种基于二人随机游戏对方代理策略的强化学习方法
8. Solving the Credit Assignment Problem: The Interaction of Explicit and Implicit Learning with Internal and External State Information [R] . Fu, W. , Anderson, J. R. 2006

机译：解决信用分配问题：外显和内隐学习与内部和外部国家信息的相互作用

Credit Assignment Method for Learning Effective Stochastic Policies in Uncertain Domains

摘要

著录项

相似文献

相关主题

期刊订阅