...
首页> 外文期刊>IEEE Transactions on Automatic Control >Bias-Corrected Q-Learning With Multistate Extension
【24h】

Bias-Corrected Q-Learning With Multistate Extension

机译:带有多状态扩展的偏置校正Q学习

获取原文
获取原文并翻译 | 示例
           

摘要

Q-learning is a sample-based model-free algorithm that solves Markov decision problems asymptotically, but in finite time, it can perform poorly when random rewards and transitions result in large variance of value estimates. We pinpoint its cause to be the estimation bias due to the maximum operator in Q-learning algorithm, and present the evidence of max-operator bias in its Q value estimates. We then present an asymptotically optimal bias-correction strategy and construct an extension to bias-corrected Q-learning algorithm to multistate Markov decision processes, with asymptotic convergence properties as strong as those from Q-learning. We report the empirical performance of the bias-corrected Q-learning algorithm with multistate extension in two model problems: A multiarmed bandit version of Roulette and an electricity storage control simulation. The bias-corrected Q-learning algorithm with multistate extension is shown to control max-operator bias effectively, where the bias-resistance can be tuned predictably by adjusting a correction parameter.
机译:Q学习是一种基于样本的无模型算法,可渐近解决Markov决策问题,但在有限时间内,当随机奖励和转移导致价值估计值出现较大差异时,它的效果可能会很差。我们将其原因精确定位为Q学习算法中由于最大算子引起的估计偏差,并在其Q值估计中给出最大算子偏差的证据。然后,我们提出了一种渐近最优偏差校正策略,并构造了对偏差校正的Q学习算法的扩展,以扩展到多状态Markov决策过程,其渐近收敛性与Q学习一样强。我们在两个模型问题中报告了具有多状态扩展的偏差校正Q学习算法的经验性能:轮盘赌的多臂匪徒版本和电存储控制仿真。示出了具有多状态扩展的经偏置校正的Q学习算法来有效地控制最大算子偏置,其中可以通过调整校正参数来可预测地调整偏置电阻。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号