Reinforcement Learning agent interacting with a real-world building to determine optimal policy may not be viable due to comfort constraints. Embodiments of the present disclosure provide multi-deep agent RL for dynamically controlling electrical equipment in buildings, wherein a simulation model is generated using design specification of (i) controllable electrical equipment (or subsystem) and (ii) building. Each RL agent is trained using simulation model and deployed in the subsystem. Reward function for each subsystem includes some portion of reward from other subsystem(s). Based on reward function of each RL agent, each RL agent learns an optimal control parameter during execution of RL agent in subsystem. Further, a global optimal control parameter list is generated using the optimal control parameter. The control parameters in the global optimal control parameters list are fine-tuned to improve subsystem's performance. Information on fine-tuning parameters of the subsystem and reward function are used for training RL agents.
展开▼