首页> 美国政府科技报告 >Learning from Noisy and Delayed Rewards: The Value of Reinforcement Learning to Defense Modeling and Simulation.
【24h】

Learning from Noisy and Delayed Rewards: The Value of Reinforcement Learning to Defense Modeling and Simulation.

机译:学习嘈杂和延迟奖励:强化学习对国防建模和仿真的价值。

获取原文

摘要

Modeling and simulation of military operations requires human behavior models capable of learning from experience in complex environments in which feedback on action quality is noisy and delayed. This research examines the potential of reinforcement learning, a class of Artificial Intelligence learning algorithms, to address this need. A novel reinforcement learning algorithm that uses the exponentially weighted average reward as an action- value estimator is described. Empirical results indicate that this relatively straight-forward approach improves learning speed in both benchmark environments and in challenging applied settings. Applications of reinforcement learning in the verification of the reward structure of a training simulation, the improvement in the performance of a discrete event simulation scheduling tool, and in enabling adaptive decision-making in combat simulation are presented. To place reinforcement learning within the context of broader models of human information processing, a practical cognitive architecture is developed and applied to the representation of a population within a conflict area. These varied applications and domains demonstrate that the potential for the use of reinforcement learning within modeling and simulation is great.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号