期刊论文详细信息
Frontiers in Energy Research
Hybrid-Model-Based Deep Reinforcement Learning for Heating, Ventilation, and Air-Conditioning Control
Ting Shu1  Zibin Pan2  Huan Zhao2  Junhua Zhao3 
[1] Guangdong-Hongkong-Macao Greater Bay Area Weather Research Center for Monitoring Warning and Forecasting, Shenzhen, China;School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China;Shenzhen Research Institute of Big Data, Shenzhen, China;
关键词: deep reinforcement learning;    model-based reinforcement learning;    hybrid model;    heating, ventilation, and air-conditioning control;    deep deterministic policy gradient;   
DOI  :  10.3389/fenrg.2020.610518
来源: DOAJ
【 摘 要 】

Buildings account for a large proportion of the total energy consumption in many countries and almost half of the energy consumption is caused by the Heating, Ventilation, and air-conditioning (HVAC) systems. The model predictive control of HVAC is a complex task due to the dynamic property of the system and environment, such as temperature and electricity price. Deep reinforcement learning (DRL) is a model-free method that utilizes the “trial and error” mechanism to learn the optimal policy. However, the learning efficiency and learning cost are the main obstacles of the DRL method to practice. To overcome this problem, the hybrid-model-based DRL method is proposed for the HVAC control problem. Firstly, a specific MDPs is defined by considering the energy cost, temperature violation, and action violation. Then the hybrid-model-based DRL method is proposed, which utilizes both the knowledge-driven model and the data-driven model during the whole learning process. Finally, the protection mechanism and adjusting reward methods are used to further reduce the learning cost. The proposed method is tested in a simulation environment using the Australian Energy Market Operator (AEMO) electricity price data and New South Wales temperature data. Simulation results show that 1) the DRL method can reduce the energy cost while maintaining the temperature satisfactory compared to the short term MPC method; 2) the proposed method improves the learning efficiency and reduces the learning cost during the learning process compared to the model-free method.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:7次