首页> 美国政府科技报告 >Learning from Noisy and Delayed Rewards: The Value of Reinforcement Learning to Defense Modeling and Simulation.

【24h】

Learning from Noisy and Delayed Rewards: The Value of Reinforcement Learning to Defense Modeling and Simulation.

机译：学习嘈杂和延迟奖励：强化学习对国防建模和仿真的价值。

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modeling and simulation of military operations requires human behavior models capable of learning from experience in complex environments in which feedback on action quality is noisy and delayed. This research examines the potential of reinforcement learning, a class of Artificial Intelligence learning algorithms, to address this need. A novel reinforcement learning algorithm that uses the exponentially weighted average reward as an action- value estimator is described. Empirical results indicate that this relatively straight-forward approach improves learning speed in both benchmark environments and in challenging applied settings. Applications of reinforcement learning in the verification of the reward structure of a training simulation, the improvement in the performance of a discrete event simulation scheduling tool, and in enabling adaptive decision-making in combat simulation are presented. To place reinforcement learning within the context of broader models of human information processing, a practical cognitive architecture is developed and applied to the representation of a population within a conflict area. These varied applications and domains demonstrate that the potential for the use of reinforcement learning within modeling and simulation is great.

著录项

作者
Alt, J. K.;
展开▼
作者单位

展开▼
年度 2012
页码 1-321
总页数 321
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Algorithms; Artificial intelligence; Cognition; Computerized simulation; Decision making; Learning; Military applications; Behavior; Combat simulation; Department of defense; Drones; Feedback; Military operations; Military training; Monte carlo method; Problem solving; Scheduling; Theory; Theses; Reinforcement learning; Autonomous agent decision making; Cognitive architectures; Exponentially weighted average reward; Action-value estimator; Cognitive modeling; Training simulations; Discrete event simulations; Adaptive decision making; Direct-q computation; Benchmark problems; Traveling salesman problem; Pacman problem; Uav scheduling; Group cognition; Adaptive behavior; Cultural geography model;

机译：算法;人工智能;认知;计算机模拟;决策;学习;军事应用;行为;战斗模拟;国防部;无人机;反馈;军事行动;军事训练;蒙特卡罗方法;问题解决;调度;理论;论文;强化学习;自主代理决策;认知架构;指数加权平均奖励;行动价值估计;认知建模;训练模拟;离散事件模拟;自适应决策;直接q计算;基准问题;旅行商问题;吃豆人问题;无人机调度;群体认知;适应行为;文化地理模型;

相似文献

外文文献
中文文献
专利

1. SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards [J] . Krishnan Sanjay, Garg Animesh, Liaw Richard, The International journal of robotics research . 2019,第2a3期

机译：SWIRL：顺序窗口逆强化学习算法，用于延迟奖励的机器人任务
2. Integration of imitation learning using GAIL and reinforcement learning using task-achievement rewards via probabilistic graphical model [J] . Kinose Akira, Taniguchi Tadahiro Advanced Robotics: The International Journal of the Robotics Society of Japan . 2020,第15a16期

机译：通过概率图形模型使用任务成就奖励使用盖尔和强化学习的模仿学习
3. An extended reinforcement learning model of basal ganglia to understand the contributions of serotonin and dopamine in risk-based decision making, reward prediction, and punishment learning [J] . Balasubramani, Pragathi P. Frontiers in Computational Neuroscience . 2014,第4期

机译：扩展的基底神经节强化学习模型，以了解5-羟色胺和多巴胺在基于风险的决策，奖励预测和惩罚学习中的作用
4. Reinforcement Learning based Decoding Using Internal Reward for Time Delayed Task in Brain Machine Interfaces [C] . Xiang Shen, Xiang Zhang, Yifan Huang, Annual International Conference of the IEEE Engineering in Medicine and Biology Society . 2020

机译：在脑机接口中使用针对内部任务的延时奖励的基于强化学习的解码
5. Learning Policies for Model-Based Reinforcement Learning Using Distributed Reward Formulation [D] . Agarwal, Nikhil. 2021

机译：使用分布式奖励制定学习基于模型的强化学习的政策
6. Immediate reinforcement in delayed reward learning in pigeons [O] . Janet Winter, Charles C. Perkins 1982

机译：立即加强鸽子延迟奖励学习
7. Learning from Noisy and Delayed Rewards The Value of Reinforcement Learning to Defense Modeling and Simulation [O] . Alt Jonathan K. 2012

机译：从嘈杂和延迟的奖励中学习强化学习对防御建模和仿真的价值

Learning from Noisy and Delayed Rewards: The Value of Reinforcement Learning to Defense Modeling and Simulation.

摘要

著录项

相似文献

相关主题

期刊订阅