Bias-Corrected Q-Learning With Multistate Extension

Lee Donghun; Powell Warren B.

首页> 外文期刊>IEEE Transactions on Automatic Control >Bias-Corrected Q-Learning With Multistate Extension

【24h】

Bias-Corrected Q-Learning With Multistate Extension

机译：带有多状态扩展的偏置校正Q学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Q-learning is a sample-based model-free algorithm that solves Markov decision problems asymptotically, but in finite time, it can perform poorly when random rewards and transitions result in large variance of value estimates. We pinpoint its cause to be the estimation bias due to the maximum operator in Q-learning algorithm, and present the evidence of max-operator bias in its Q value estimates. We then present an asymptotically optimal bias-correction strategy and construct an extension to bias-corrected Q-learning algorithm to multistate Markov decision processes, with asymptotic convergence properties as strong as those from Q-learning. We report the empirical performance of the bias-corrected Q-learning algorithm with multistate extension in two model problems: A multiarmed bandit version of Roulette and an electricity storage control simulation. The bias-corrected Q-learning algorithm with multistate extension is shown to control max-operator bias effectively, where the bias-resistance can be tuned predictably by adjusting a correction parameter.

机译：Q学习是一种基于样本的无模型算法，可渐近解决Markov决策问题，但在有限时间内，当随机奖励和转移导致价值估计值出现较大差异时，它的效果可能会很差。我们将其原因精确定位为Q学习算法中由于最大算子引起的估计偏差，并在其Q值估计中给出最大算子偏差的证据。然后，我们提出了一种渐近最优偏差校正策略，并构造了对偏差校正的Q学习算法的扩展，以扩展到多状态Markov决策过程，其渐近收敛性与Q学习一样强。我们在两个模型问题中报告了具有多状态扩展的偏差校正Q学习算法的经验性能：轮盘赌的多臂匪徒版本和电存储控制仿真。示出了具有多状态扩展的经偏置校正的Q学习算法来有效地控制最大算子偏置，其中可以通过调整校正参数来可预测地调整偏置电阻。

著录项

来源
《IEEE Transactions on Automatic Control》 |2019年第10期|4011-4023|共13页
作者
Lee Donghun; Powell Warren B.;
展开▼
作者单位

Princeton Univ Dept Comp Sci Comp Sci Princeton NJ 08540 USA;

Princeton Univ Dept Operat Res & Financial Engn Princeton NJ 08540 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Bias correction; electricity storage; Q-learning; smart grid;

机译：偏差校正;蓄电;Q学习智能电网;

相似文献

外文文献
中文文献
专利

1. Low-Temperature Quantum Fokker-Planck and Smoluchowski Equations and Their Extension to Multistate Systems [J] . Ikeda Tatsushi, Tanimura Yoshitaka Journal of chemical theory and computation: JCTC . 2019,第4期

机译：低温量子FOKKER-PLANCK和SMOLUCHOWSKI方程及其扩展到多态系统
2. Assessing Noninferiority in Treatment Trials for Severe Infectious Diseases: an Extension to the Entire Follow-Up Period Using a Cure-Death Multistate Model [J] . Sommer Harriet, Bluhmki Tobias, Beyersmann Jan, Antimicrobial agents and chemotherapy. . 2018,第1期

机译：评估严重传染病治疗试验中的非事实体：使用治愈死亡多态模型对整个随访期的延伸
3. Promoting Access to Health Insurance through a Multistate Extension Collaboration [J] . Joan Koonce, Judith Aboahye, Nick Oliver Journal of Human Sciences and Extension . 2017,第1期

机译：通过多州扩展协作促进健康保险的获取
4. Bias-corrected Q-learning to control max-operator bias in Q-learning [C] . Lee Donghun, Defourny Boris, Powell Warren B. IEEE Symposium on Adaptive Dynamic Programming And Reinforcement Learning . 2013

机译：偏差校正Q学习控制Q学习中的最大算子偏差
5. Extensions and characterization of optimal maintenance policies for multistate partially observed Markovian systems. [D] . Fadel AlDurgam, Mohammad Mansour. 2009

机译：多状态部分观测的马尔可夫系统的最优维护策略的扩展和特征。
6. Assessing Noninferiority in Treatment Trials for Severe Infectious Diseases: an Extension to the Entire Follow-Up Period Using a Cure-Death Multistate Model [O] . Harriet Sommer, Tobias Bluhmki, Jan Beyersmann, 2018

机译：在严重传染病的治疗试验中评估非劣效性：使用治愈死亡多状态模型对整个随访期的扩展
7. Q-Learning: A Tutorial and Extensions [O] . George Cybenko, Robert Gray, Katsuhiro Moizumi 1995

机译：Q-Learning：教程和扩展

Bias-Corrected Q-Learning With Multistate Extension

摘要

著录项

相似文献

相关主题

期刊订阅