【24h】

Credit Assignment Method for Learning Effective Stochastic Policies in Uncertain Domains

机译:学习不确定领域有效随机策略的学分分配方法

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we introduce FirstVisit Profit-sharing (FVPS) as a credit assignment procedure, an important issue in classifier systems and reinforcement learning frameworks. FVPS reinforces effective rules to make an agent acquire stochastic policies that cause it to behave very robustly within uncertain domains, without pre-defined knowledge or subgoals. We use an internal episodic memory, not only to identify perceptual aliasing states but also to discard looping behavior and to acquire effective stochastic policies to escape perceptual deceptive states. We demonstrate the effectiveness of our method in some typical classes of Partially Observable Markov Decision Processes, comparing with Sarsa(A) using a replacing eligibility trace. We claim that this approach results in an effective stochastic or deterministic policy which is appropriate for the environment.
机译:在本文中,我们介绍了FirstVisit利润分享(FVPS)作为学分分配程序,这是分类器系统和强化学习框架中的重要问题。 FVPS加强了有效的规则,以使代理获得随机策略,从而使其在不确定的域内表现得非常强大,而无需预先定义的知识或子目标。我们使用内部情节记忆,不仅可以识别感知混叠状态,还可以丢弃循环行为,并获得有效的随机策略来逃避感知欺骗状态。我们在部分可观察的马尔可夫决策过程的一些典型类别中证明了我们方法的有效性,并使用替代资格跟踪与Sarsa(A)进行了比较。我们声称,这种方法会产生适用于环境的有效随机或确定性策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号