Abstract
The paper sets out to obtain precise convergence rates for quasi-stochastic approximation (QSA), with applications to optimization and reinforcement learning. The main contributions are obtained for general nonlinear algorithms, under the assumption that there is a well defined linearization near the optimal parameter , with Hurwitz linearization matrix A. Subject to stability of the algorithm (general conditions are surveyed in the paper): (i)If the algorithm gain is chosen as a_{t}=g/(1+t) with g > 0 and \rho\in(0, 1), then a 'finite-t' approximation is obtained \begin{equation*} a_{t-1}\{t}-\}=\bar{Y}+\Xi_{t{I}}+o(1) \end{equation*} where _{t} is the parameter estimate, \bar{Y}\in {Rd} is a vector identified in the paper, and \{\Xi_{t{I}}\} is bounded with zero mean. (ii)The approximation continues to hold with a_{t}=g/(1+t) under the stronger assumption that I+gA is Hurwitz. (iii)The Ruppert-Polyak averaging technique is extended to this setting, in which the estimates \{t}\} are obtained using the gain in (i), and _{t\mathbf{RP}} is defined to be the running average. The convergence rate is 1/t if and only if \bar{Y}=0. (iv)The theory is illustrated with applications to gradient-free optimization, and policy gradient algorithms for reinforcement learning.
Original language | American English |
---|---|
Pages | 1965-1972 |
Number of pages | 8 |
DOIs | |
State | Published - 2021 |
Event | 2021 American Control Conference, ACC 2021 - Virtual, New Orleans, United States Duration: 25 May 2021 → 28 May 2021 |
Conference
Conference | 2021 American Control Conference, ACC 2021 |
---|---|
Country/Territory | United States |
City | Virtual, New Orleans |
Period | 25/05/21 → 28/05/21 |
Bibliographical note
Publisher Copyright:© 2021 American Automatic Control Council.
NREL Publication Number
- NREL/CP-5D00-80782
Keywords
- approximation algorithms
- convergence
- optimal control
- optimization
- reinforcement learning