Abstract
In multi-timescale multi-agent reinforcement learning (MARL), agents interact across different timescales. In general, policies for time-dependent behaviors, such as those induced by multiple timescales, are non-stationary. Learning non-stationary policies is challenging and typically requires sophisticated or inefficient algorithms. Motivated by the prevalence of this control problem in real-world complex systems, we introduce a simple framework for learning non-stationary policies for multi-timescale MARL. Our approach uses available information about agent timescales to define and learn periodic multi-agent policies. In detail, we theoretically demonstrate that the effects of non-stationarity introduced by multiple timescales can be learned by a periodic multi-agent policy. To learn such policies, we propose a policy gradient algorithm that parameterizes the actor and critic with phase-functioned neural networks, which provide an inductive bias for periodicity. The framework's ability to effectively learn multi-timescale policies is validated on a gridworld and building energy management environment.
Original language | American English |
---|---|
Number of pages | 7 |
DOIs | |
State | Published - 2024 |
Event | 2023 62nd IEEE Conference on Decision and Control (CDC) - Singapore Duration: 13 Dec 2023 → 15 Dec 2023 |
Conference
Conference | 2023 62nd IEEE Conference on Decision and Control (CDC) |
---|---|
City | Singapore |
Period | 13/12/23 → 15/12/23 |
NREL Publication Number
- NREL/CP-2C00-83437
Keywords
- control
- multi-agent
- multi-timescale
- reinforcement learning