IMITATION REINFORCEMENT LEARNING AND RULE-BASED EXPERTS FOR BUILDING ENERGY SYSTEMS MANAGEMENT

Authors

DOI:

https://doi.org/10.32689/maup.it.2025.3.5

Keywords:

machine learning, neural networks, reinforcement learning, imitation learning, behavioral cloning, DAgger, SAC, building energy management, CityLearn

Abstract

The relevance of the study is determined by the existing sample inefficiency barrier preventing reinforcementlearning deployment in building energy management. Traditional RL algorithms require thousands of training episodes (equivalent to decades of simulated operation), making them impractical for safety-critical infrastructure where poor decisions risk equipment damage and grid instability.The aim of the paper is to investigate how imitation learning can accelerate RL convergence through expert demonstrations from optimized rule-based controllers. The research evaluates three approaches: behavioral cloning (BC-SAC), dataset aggregation (DAgger-SAC), and imitation bootstrapped reinforcement learning (IBRL-SAC), all tested within the standardized CityLearn environment for multi-objective building control.Methodology employs Bayesian-optimized rule-based controllers as expert demonstrators, evaluated across multiplebuilding configurations using real operational data from residential buildings with photovoltaic systems and batterystorage. Each variant combines expert-guided initialization with standard SAC training, tested over 365-day simulations with performance measured by cost reduction, emission minimization, and grid stability metrics.Results show that BC-SAC achieves nearly 50% reduction in training requirements while maintaining superior performance,outperforming both standard SAC and optimized rule-based controllers. Imitation learning methods demonstrate competent performance from initial episodes, eliminating the risky exploration phase that prevents real-world deployment.Scientific novelty lies in being the first comprehensive evaluation of imitation learning variants for CityLearn, establishing quantitative efficiency-performance trade-offs previously unexplored in standardized benchmarks. The research proves thatoptimized rule-based experts can effectively bootstrap RL policies, creating a practical pathway for deployment where extensive training is prohibitive.

References

Bain M., Sammut C. A framework for behavioural cloning. Machine Intelligence 15. 2000. P. 103–129.

Global Alliance for Buildings and Construction (GABC). 2021 Global Status Report for Buildings and Construction. UN Environment Programme. 2021. URL: https://globalabc.org/resources/publications/2021-global-status-report-buildings-and-construction (date of access: 21.09.2025).

Haarnoja T., Zhou A., Abbeel P., Levine S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. Proceedings of the 35th International Conference on Machine Learning. 2018. Vol. 80. P. 1861–1870. URL: https://proceedings.mlr.press/v80/haarnoja18b.html (date of access: 21.09.2025).

Konda V. R., Tsitsiklis J. N. Actor-Critic Algorithms. Advances in Neural Information Processing Systems. 2000. Vol. 12. P. 1008–1014. URL: https://proceedings.neurips.cc/paper/1999/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf (date of access: 21.09.2025).

Mason K., Grijalva S. A review of reinforcement learning for autonomous building energy management. Computers & Electrical Engineering. 2019. Vol. 78. P. 300–312. DOI: https://doi.org/10.1016/j.compeleceng.2019.07.019 (date of access: 21.09.2025).

Mockus J., Tiesis V., Zilinskas A. The application of Bayesian methods for seeking the extremum. Towards Global Optimization. 1978. Vol. 2. P. 117–129. (date of access: 21.09.2025).

Nweye K., Siva S., Nagy G. Z. The CityLearn Challenge 2022 Dataset. Texas Data Repository. 2023. DOI: https://doi.org/10.18738/T8/0YLJ6Q (date of access: 21.09.2025).

Oldewurtel F., Parisio A., Jones C. N., Gyalistras D., Gwerder M., Stauch V., Lehmann B., Morari M. Use of model predictive control and weather forecasts for energy efficient building climate control. Energy and Buildings. 2012. Vol. 45. P. 15–27. DOI: https://doi.org/10.1016/j.enbuild.2011.09.022 (date of access: 21.09.2025).

Perera K. S., Aung Z., Woon W. L. Machine learning techniques for supporting renewable energy generation and integration: A survey. Proceedings of the Data Analytics for Renewable Energy Integration. 2014. P. 81–96. DOI: https://doi.org/10.1007/978-3-319-13290-7_6 (date of access: 21.09.2025).

Raffin A., Hill A., Gleave A., Kanervisto A., Ernestus M., Dormann N. Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research. 2021. Vol. 22, No. 268. P. 1–8. URL: http://jmlr.org/papers/v22/20-1364.html (date of access: 21.09.2025).

Rawlings J. B., Mayne D. Q., Diehl M. Model Predictive Control: Theory, Computation, and Design. 2nd edition. Nob Hill Publishing. 2017. ISBN: 978-0975937730.

Ross S., Gordon G., Bagnell D. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. 2011. Vol. 15. P. 627–635. URL: https://proceedings.mlr.press/v15/ross11a.html (date of access: 21.09.2025).

Ruelens F., Claessens B. J., Vandael S., De Schutter B., Babuška R., Belmans R. Residential demand response of thermostatically controlled loads using batch reinforcement learning. IEEE Transactions on Smart Grid. 2017. Vol. 8, No. 5. P. 2149–2159. DOI: https://doi.org/10.1109/TSG.2016.2517211 (date of access: 21.09.2025).

Sutton R. S., Barto A. G. Reinforcement Learning: An Introduction. MIT Press. 2018. 2nd edition. ISBN: 978-0262039246.

Uchendu I., Xiao T., Lu Y., Zhu B., Yan M., Simon J., Bennice M., Fu C., Ma C., Jiao J., Lee S., Levine S. Jump-Start Reinforcement Learning. Proceedings of the 40th International Conference on Machine Learning. 2023. Vol. 202. P. 34556–34583. URL: https://proceedings.mlr.press/v202/uchendu23a.html (date of access: 21.09.2025).

Vazquez-Canteli J. R., Dey S., Henze G., Nagy Z. CityLearn: Standardizing Research in Multi-Agent Reinforcement Learning for Demand Response and Urban Energy Management. arXiv preprint. 2020. arXiv:2012.10504. URL: https://arxiv.org/abs/2012.10504 (date of access: 21.09.2025).

Vazquez-Canteli J. R., Nagy Z. Reinforcement learning for demand response: A review of algorithms and modeling techniques. Applied Energy. 2019. Vol. 235. P. 1072–1089. DOI: https://doi.org/10.1016/j.apenergy.2018.11.002 (date of access: 21.09.2025).

Wei T., Wang Y., Zhu Q. Deep reinforcement learning for building HVAC control. Proceedings of the 54th Annual Design Automation Conference. 2017. Article 22. P. 1–6. DOI: https://doi.org/10.1145/3061639.3062224 (date of access: 21.09.2025).

Yu L., Qin S., Zhang M., Shen C., Jiang T., Guan X. A review of deep reinforcement learning for smart building energy management. IEEE Internet of Things Journal. 2021. Vol. 8, No. 15. P. 12046–12063. DOI: https://doi.org/10.1109/JIOT.2021.3078462 (date of access: 21.09.2025).

Zhang Z., Chong A., Pan Y., Zhang C., Lam K. P. Whole building energy model for HVAC optimal control: A practical framework based on deep reinforcement learning. Energy and Buildings. 2019. Vol. 199. P. 472–490. DOI: https://doi.org/10.1016/j.enbuild.2019.07.029 (date of access: 21.09.2025).

Войтех Д. В., Тимошенко А. Г. Використання машинного навчання та мережевих наборів даних для моделювання енергосистем. Інфокомунікаційні та комп’ютерні технології. 2024. Том 1, № 07. С. 35–45. DOI: https://doi.org/10.36994/2788-5518-2024-01-07-05 (дата звернення: 21.09.2025).

Downloads

Published

2025-12-04

How to Cite

ВОЙТЕХ, Д., & ТИМОШЕНКО, А. (2025). IMITATION REINFORCEMENT LEARNING AND RULE-BASED EXPERTS FOR BUILDING ENERGY SYSTEMS MANAGEMENT. Information Technology and Society, (3 (18), 40-47. https://doi.org/10.32689/maup.it.2025.3.5