Online Bellman Residual and Temporal Difference Algorithms with Predictive Error Guarantees

机译：在线Bellman残差和时间差算法，预测错误保证

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We establish connections from optimizing Bellman Residual and Temporal Difference Loss to worst-case long-term predictive error. In the online learning framework, learning takes place over a sequence of trials with the goal of predicting a future discounted sum of rewards. Our first analysis shows that, together with a stability assumption, any no-regret online learning algorithm that minimizes Bellman error ensures small prediction error. Our second analysis shows that applying the family of online mirror descent algorithms on temporal difference loss also ensures small prediction error. No statistical assumptions are made on the sequence of observations, which could be non-Markovian or even adversarial. Our approach thus establishes a broad new family of provably sound algorithms and provides a generalization of previous worst-case results for minimizing predictive error. We investigate the potential advantages of some of this family both theoretically and empirically on benchmark problems.

机译：我们建立了从优化Bellman剩余和时间差异损失到最坏情况的长期预测误差的连接。在网上学习框架中，学习在一系列试验中进行，目标是预测未来折扣奖励的裁员。我们的第一个分析表明，与稳定的假设一起，任何无遗憾的在线学习算法最小化Bellman错误确保小预测误差。我们的第二个分析表明，将在线镜像血管下降算法应用于时间差异损失也确保了小预测误差。没有对观察序列进行统计假设，这可能是非马尔可夫甚至对抗性。因此，我们的方法因此建立了广泛的新型声音算法，并提供了以前最大化的最大案例结果的概括，以便最小化预测误差。我们在理论上和经验上调查一些这个家庭的潜在优势在基准问题上。

著录项

来源
《International Joint Conference on Artificial Intelligence》|2016年|3560-4297p|共5页
会议地点
作者
Wen Sun; J. Andrew Bagnell;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. An improved algorithm for segmenting online time series with error bound guarantee [J] . Zhao Huan-yu, Li Guang-xia, Zhang Hao-lan, International journal of machine learning and cybernetics . 2016,第3期

机译：具有误差限制保证的在线时间序列分割算法
2. Bellman residuals minimization using online support vector machines [J] . Esposito Gennaro, Martin Mario Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2017,第3期

机译：使用在线支持向量机器的贝尔曼残留最小化最小化
3. New error correction algorithms minimizing residual positional error for a computer integrated error calibration/correction system in computer numerically controlled machine tools [J] . H. J. Pahk, S. W. Lee Proceedings of the Institution of Mechanical Engineers, Part C. Journal of mechanical engineering science . 1999,第C7期

机译：新的纠错算法可最大程度地减少计算机数控机床中的计算机集成错误校准/纠正系统的剩余位置错误
4. Online Bellman Residual and Temporal Difference Algorithms with Predictive Error Guarantees [C] . Wen Sun, J. Andrew Bagnell International Joint Conference on Artificial Intelligence . 2016

机译：在线Bellman残差和时间差算法，预测错误保证
5. Energy Storage Applications of the Knowledge Gradient for Calibrating Continuous Parameters, Approximate Policy Iteration using Bellman Error Minimization with Instrumental Variables, and Covariance Matrix Estimation using an Errors-in-Variables Factor Model. [D] . Scott, Warren Robert. 2012

机译：知识梯度的能量存储应用，用于校准连续参数，使用带工具变量的Bellman误差最小化进行近似策略迭代以及使用可变误差因子模型进行协方差矩阵估计。
6. Structural brain differences in school-age children with residual speech sound errors [O] . Jonathan L. Preston, Peter J. Molfese, W. Einar Mencl, -1

机译：残留语音误差的学龄儿童的结构性大脑差异
7. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view [O] . Scherrer, Bruno 2010

机译：应该计算时间差异定点还是最小化 Bellman Residual？统一的斜投影视图

Online Bellman Residual and Temporal Difference Algorithms with Predictive Error Guarantees

摘要

著录项

相似文献

相关主题

期刊订阅