首页> 外文会议>International Joint Conference on Artificial Intelligence >Online Bellman Residual and Temporal Difference Algorithms with Predictive Error Guarantees
【24h】

Online Bellman Residual and Temporal Difference Algorithms with Predictive Error Guarantees

机译:在线Bellman残差和时间差算法,预测错误保证

获取原文

摘要

We establish connections from optimizing Bellman Residual and Temporal Difference Loss to worst-case long-term predictive error. In the online learning framework, learning takes place over a sequence of trials with the goal of predicting a future discounted sum of rewards. Our first analysis shows that, together with a stability assumption, any no-regret online learning algorithm that minimizes Bellman error ensures small prediction error. Our second analysis shows that applying the family of online mirror descent algorithms on temporal difference loss also ensures small prediction error. No statistical assumptions are made on the sequence of observations, which could be non-Markovian or even adversarial. Our approach thus establishes a broad new family of provably sound algorithms and provides a generalization of previous worst-case results for minimizing predictive error. We investigate the potential advantages of some of this family both theoretically and empirically on benchmark problems.
机译:我们建立了从优化Bellman剩余和时间差异损失到最坏情况的长期预测误差的连接。在网上学习框架中,学习在一系列试验中进行,目标是预测未来折扣奖励的裁员。我们的第一个分析表明,与稳定的假设一起,任何无遗憾的在线学习算法最小化Bellman错误确保小预测误差。我们的第二个分析表明,将在线镜像血管下降算法应用于时间差异损失也确保了小预测误差。没有对观察序列进行统计假设,这可能是非马尔可夫甚至对抗性。因此,我们的方法因此建立了广泛的新型声音算法,并提供了以前最大化的最大案例结果的概括,以便最小化预测误差。我们在理论上和经验上调查一些这个家庭的潜在优势在基准问题上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号