## Abstract

The paper sets out to obtain precise convergence rates for quasi-stochastic approximation (QSA), with applications to optimization and reinforcement learning. The main contributions are obtained for general nonlinear algorithms, under the assumption that there is a well defined linearization near the optimal parameter , with Hurwitz linearization matrix A. Subject to stability of the algorithm (general conditions are surveyed in the paper): (i)If the algorithm gain is chosen as a_{t}=g/(1+t) with g > 0 and \rho\in(0, 1), then a 'finite-t' approximation is obtained \begin{equation*} a_{t-1}\{t}-\}=\bar{Y}+\Xi_{t{I}}+o(1) \end{equation*} where _{t} is the parameter estimate, \bar{Y}\in {Rd} is a vector identified in the paper, and \{\Xi_{t{I}}\} is bounded with zero mean. (ii)The approximation continues to hold with a_{t}=g/(1+t) under the stronger assumption that I+gA is Hurwitz. (iii)The Ruppert-Polyak averaging technique is extended to this setting, in which the estimates \{t}\} are obtained using the gain in (i), and _{t\mathbf{RP}} is defined to be the running average. The convergence rate is 1/t if and only if \bar{Y}=0. (iv)The theory is illustrated with applications to gradient-free optimization, and policy gradient algorithms for reinforcement learning.

Original language | American English |
---|---|

Pages | 1965-1972 |

Number of pages | 8 |

DOIs | |

State | Published - 2021 |

Event | 2021 American Control Conference, ACC 2021 - Virtual, New Orleans, United States Duration: 25 May 2021 → 28 May 2021 |

### Conference

Conference | 2021 American Control Conference, ACC 2021 |
---|---|

Country/Territory | United States |

City | Virtual, New Orleans |

Period | 25/05/21 → 28/05/21 |

### Bibliographical note

Publisher Copyright:© 2021 American Automatic Control Council.

## NREL Publication Number

- NREL/CP-5D00-80782

## Keywords

- approximation algorithms
- convergence
- optimal control
- optimization
- reinforcement learning