...
首页> 外文期刊>Smart Grid, IEEE Transactions on >Definition and Evaluation of Model-Free Coordination of Electrical Vehicle Charging With Reinforcement Learning
【24h】

Definition and Evaluation of Model-Free Coordination of Electrical Vehicle Charging With Reinforcement Learning

机译:电动汽车充电与强化学习的无模型协调的定义与评估

获取原文
获取原文并翻译 | 示例
           

摘要

Demand response (DR) becomes critical to manage the charging load of a growing electric vehicle (EV) deployment. Initial DR studies mainly adopt model predictive control, but models are largely uncertain for the EV scenario (e.g., customer behavior). Model-free approaches, based on reinforcement learning (RL), are an attractive alternative. We propose a new Markov decision process (MDP) formulation in the RL framework, to jointly coordinate a set of charging stations. State-of-the-art algorithms either focus on a single EV, or control an aggregate of EVs in multiple steps (e.g., 1) make aggregate load decisions and 2) translate the aggregate decision to individual EVs). In contrast, our RL approach jointly controls the whole set of EVs at once. We contribute a new MDP formulation with a scalable state representation independent of the number of charging stations. Using a batch RL algorithm, fitted $Q$ -iteration, we learn an optimal charging policy. With simulations using real-world data, we: 1) differentiate settings in training the RL policy (e.g., the time span covered by training data); 2) compare its performance to an oracle all-knowing benchmark (providing an upper performance bound); 3) analyze performance fluctuations throughout a full year; and 4) demonstrate generalization capacity to larger sets of charging stations.
机译:需求响应(DR)对于管理不断增长的电动汽车(EV)部署的充电负载变得至关重要。最初的灾难恢复研究主要采用模型预测控制,但对于电动汽车场景(例如,客户行为),模型很大程度上不确定。基于强化学习(RL)的无模型方法是一种有吸引力的替代方法。我们在RL框架中提出了新的马尔可夫决策过程(MDP)公式,以共同协调一组充电站。最先进的算法要么专注于单个EV,要么分多个步骤控制EV的汇总(例如1)做出汇总负载决策,以及2)将汇总决策转换为单个EV)。相比之下,我们的RL方法可以一次共同控制整个电动汽车。我们贡献了一种新的MDP公式,该公式具有可扩展的状态表示形式,而与充电站的数量无关。使用批量RL算法(拟合$ Q $-迭代),我们学习了最佳充电策略。通过使用实际数据进行的模拟,我们:1)在训练RL策略时区分设置(例如,训练数据覆盖的时间跨度); 2)将其性能与oracle全知基准(提供性能上限)进行比较; 3)分析全年的绩效波动;和4)展示了对更大数量的充电站的概括能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号