Abstract
We study multiagent reinforcement learning (MARL) with constraints. This setting is gaining importance as MARL algorithms find new applications in real-world systems ranging from power grids to drone swarms. Most constrained MARL (C-MARL) algorithms use a primal-dual approach to enforce constraints through a penalty function added to the reward. In this paper, we study the structural effects of the primal-dual approach on the constraints and value function. First, we show that using the constraint evaluation as the penalty leads to a weak notion of safety, but by making simple modifications to the penalty function, we can enforce meaningful probabilistic safety constraints. Second, we show that the penalty term changes the value function in a way that is easy to model, and demonstrate the consequences of not doing so. We conclude with simulations in a simple constrained multiagent environment to back up the theoretical results.
Original language | American English |
---|---|
Number of pages | 19 |
State | Published - 2023 |
Event | 5th Annual Learning for Dynamics & Control Conference - University of Pennsylvania Duration: 15 Jun 2023 → 16 Jun 2023 |
Conference
Conference | 5th Annual Learning for Dynamics & Control Conference |
---|---|
City | University of Pennsylvania |
Period | 15/06/23 → 16/06/23 |
Bibliographical note
See NREL/CP-5D00-87858 for paper as published in proceedingsNREL Publication Number
- NREL/CP-5D00-84649
Keywords
- data-driven control
- multi-agent
- reinforcement learning
- safe control