Accelerating Optimization and Reinforcement Learning with Quasi Stochastic Approximation

Shuhang Chen, Adithya Devraj, Andrey Bernstein, Sean Meyn

Research output: Contribution to conferencePaperpeer-review

4 Scopus Citations

Abstract

The paper sets out to obtain precise convergence rates for quasi-stochastic approximation (QSA), with applications to optimization and reinforcement learning. The main contributions are obtained for general nonlinear algorithms, under the assumption that there is a well defined linearization near the optimal parameter , with Hurwitz linearization matrix A. Subject to stability of the algorithm (general conditions are surveyed in the paper): (i)If the algorithm gain is chosen as a_{t}=g/(1+t) with g > 0 and \rho\in(0, 1), then a 'finite-t' approximation is obtained \begin{equation*} a_{t-1}\{t}-\}=\bar{Y}+\Xi_{t{I}}+o(1) \end{equation*} where _{t} is the parameter estimate, \bar{Y}\in {Rd} is a vector identified in the paper, and \{\Xi_{t{I}}\} is bounded with zero mean. (ii)The approximation continues to hold with a_{t}=g/(1+t) under the stronger assumption that I+gA is Hurwitz. (iii)The Ruppert-Polyak averaging technique is extended to this setting, in which the estimates \{t}\} are obtained using the gain in (i), and _{t\mathbf{RP}} is defined to be the running average. The convergence rate is 1/t if and only if \bar{Y}=0. (iv)The theory is illustrated with applications to gradient-free optimization, and policy gradient algorithms for reinforcement learning.

Original languageAmerican English
Pages1965-1972
Number of pages8
DOIs
StatePublished - 2021
Event2021 American Control Conference, ACC 2021 - Virtual, New Orleans, United States
Duration: 25 May 202128 May 2021

Conference

Conference2021 American Control Conference, ACC 2021
Country/TerritoryUnited States
CityVirtual, New Orleans
Period25/05/2128/05/21

Bibliographical note

Publisher Copyright:
© 2021 American Automatic Control Council.

NREL Publication Number

  • NREL/CP-5D00-80782

Keywords

  • approximation algorithms
  • convergence
  • optimal control
  • optimization
  • reinforcement learning

Fingerprint

Dive into the research topics of 'Accelerating Optimization and Reinforcement Learning with Quasi Stochastic Approximation'. Together they form a unique fingerprint.

Cite this