Abstract
This paper develops an intelligent grid-interactive building controller, which optimizes building operation during both normal hours and demand response (DR) events. To avoid costly on-demand computation and to adapt to non-linear building models, the controller utilizes reinforcement learning (RL) and makes real-time decisions based on a near-optimal control policy. Learning such a policy typically amounts to solving a hard non-convex optimization problem. We propose to address this problem with a novel global-local policy search method. In the first stage, an RL algorithm based on zero-order gradient estimation is leveraged to search for the optimal policy globally, due to its scalability and the potential to escape some poor performing local optima. The obtained policy is then fine-tuned locally to bring the first-stage solution closer to that of the original unsmoothed problem. Experiments on a simulated five-zone commercial building demonstrate the advantages of the proposed method over existing learning approaches. They also show that the learned control policy outperforms a pragmatic linear model predictive controller (MPC) and approaches the performance of an oracle MPC in testing scenarios. Using a state-of-the-art advanced computing system, we demonstrate that the controller can be learned and deployed within hours of training.
Original language | American English |
---|---|
Pages (from-to) | 1976-1987 |
Number of pages | 12 |
Journal | IEEE Transactions on Smart Grid |
Volume | 13 |
Issue number | 3 |
DOIs | |
State | Published - 2022 |
Bibliographical note
Publisher Copyright:© 2010-2012 IEEE.
NREL Publication Number
- NREL/JA-2C00-79559
Keywords
- demand response
- Reinforcement learning
- smart building
- zero-order gradient estimation