首页> 美国政府科技报告 >Least Squares Temporal Difference Actor-Critic Algorithm with Applications to Warehouse Management.
【24h】

Least Squares Temporal Difference Actor-Critic Algorithm with Applications to Warehouse Management.

机译:最小二乘时间差分行为 - 批评算法及其在仓库管理中的应用。

获取原文

摘要

This paper develops a new approximate dynamic programming algorithm for Markov decision problems and applies it to a vehicle dispatching problem arising in warehouse management. The algorithm is of the actor-critic type and uses a least squares temporal difference learning method. It operates on a sample-path of the system and optimizes the policy within a prespecified class parameterized by a parsimonious set of parameters. The method is applicable to a partially observable Markov decision process setting where the measurements of state variables are potentially corrupted and the cost is only observed through the imperfect state observations. We show that under reasonable assumptions, the algorithm converges to a locally optimal parameter set. We also show that the imperfect cost observations do not affect the policy and the algorithm minimizes the true expected cost. In the warehouse application, the problem is to dispatch sensor-equipped forklifts in order to minimize operating costs involving product movement delays and forklift maintenance. We consider instances where standard dynamic programming is computationally intractable. Simulation results confirm the theoretical claims of the paper and show that our algorithm converges more smoothly than earlier actor-critic algorithms while substantially outperforming heuristics used in practice.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号