首页> 外文会议>2013 IEEE Conference on Computatonal Intelligence in Games >Evolutionary reature evaluation for online Reinforcement Learning
【24h】

Evolutionary reature evaluation for online Reinforcement Learning

机译:在线强化学习的进化态势评估

获取原文
获取原文并翻译 | 示例

摘要

Most successful examples of Reinforcement Learning (RL) report the use of carefully designed features, that is, a representation of the problem state that facilitates effective learning. The best features cannot always be known in advance, creating the need to evaluate more features than will ultimately be chosen. This paper presents Temporal Difference Feature Evaluation (TDFE), a novel approach to the problem of feature evaluation in an online RL agent. TDFE combines value function learning by temporal difference methods with an evolutionary algorithm that searches the space of feature subsets, and outputs franking over all individual features. TDFE dynamically adjusts its ranking, avoids the sample complexity multiplier of many population-based approaches, and works with arbitrary feature representations. Online learning experiments are performed in the game of Connect Four, establishing (i) that the choice of features is critical, (ii) that TDFE can evaluate and rank all the available features online, and (iii) that the ranking can be used effectively as the basis of dynamic online feature selection.
机译:强化学习(RL)的大多数成功示例都报告了精心设计的功能的使用,即,表示有助于有效学习的问题状态的表示。最好的功能不一定总是事先知道的,这就需要评估比最终选择的功能更多的功能。本文介绍了时差特征评估(TDFE),这是一种解决在线RL代理中特征评估问题的新颖方法。 TDFE将通过时差方法进行的价值函数学习与一种进化算法相结合,该算法搜索特征子集的空间,并输出对所有单个特征的盖印。 TDFE动态调整其排名,避免了许多基于人口的方法的样本复杂度乘数,并可以使用任意特征表示。在线学习实验是在“连接四人”游戏中进行的,它确定(i)功能选择至关重要,(ii)TDFE可以在线评估和评估所有可用功能,以及(iii)可以有效地使用排名作为动态在线功能选择的基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号