...
首页> 外文期刊>Journal of Computing in Civil Engineering >Policy Analysis of Adaptive Traffic Signal Control Using Reinforcement Learning
【24h】

Policy Analysis of Adaptive Traffic Signal Control Using Reinforcement Learning

机译:基于强化学习的自适应交通信号控制策略分析

获取原文
获取原文并翻译 | 示例
           

摘要

Previous research studies have successfully developed adaptive traffic signal controllers using reinforcement learning; however, few have focused on analyzing what specifically reinforcement learning does differently than other traffic signal control methods. This study proposes and develops two reinforcement learning adaptive traffic signal controllers, analyzes their learned policies, and compares them to a Webster's controller. The asynchronous Q-learning and advantage actor-critic adaptive algorithms are used to develop reinforcement learning traffic signal controllers using neural network function approximation with two action spaces. Using an aggregate statistic state representation (i.e., vehicle queue and density), the proposed reinforcement learning traffic signal controllers develop the optimal policy in a dynamic, stochastic traffic microsimulation. Results show that the reinforcement learning controllers increases red and yellow times but ultimately achieve superior performance compared to the Webster's controller, reducing mean queues, stopped time, and travel time. The reinforcement learning controllers exhibit goal-oriented behavior, developing a policy that excludes many phases found in a tradition phase cycle (i.e., protected turning movements) instead of choosing phases that maximize reward, as opposed to the Webster's controller, which is constrained by cyclical logic that diminishes performance. (c) 2019 American Society of Civil Engineers.
机译:先前的研究已经通过强化学习成功开发了自适应交通信号控制器。但是,很少有人专注于分析强化学习与其他交通信号控制方法的不同之处。这项研究提出并开发了两个强化学习自适应交通信号控制器,分析了它们的学习策略,并将其与韦伯斯特控制器进行了比较。异步Q学习和优势参与者批评自适应算法用于通过具有两个动作空间的神经网络函数逼近来开发强化学习交通信号控制器。使用总体统计状态表示(即车辆队列和密度),提出的强化学习交通信号控制器在动态,随机交通微观模拟中开发了最佳策略。结果表明,强化学习控制器增加了红色和黄色时间,但最终达到了比Webster控制器更高的性能,从而减少了平均排队,停止时间和出行时间。强化学习控制器表现出面向目标的行为,制定了一种策略,该策略排除了传统阶段周期中发现的许多阶段(即受保护的转向运动),而不是选择最大化回报的阶段,这与韦伯斯特的控制器相对,后者受周期性的约束降低性能的逻辑。 (c)2019美国土木工程师学会。

著录项

  • 来源
    《Journal of Computing in Civil Engineering》 |2020年第1期|04019046.1-04019046.10|共10页
  • 作者

    Genders Wade; Razavi Saiedeh;

  • 作者单位

    McMaster Univ Dept Civil Engn 1280 Main St West Hamilton ON L8S 4L8 Canada;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号