Abstract
The Markov decision process is the mathematical formalization underlying the modern field of reinforcement learning when transition and reward functions are unknown. We derive a pseudo-Boolean cost function that is equivalent to a K-spin Hamiltonian representation of the discrete, finite, discounted Markov decision process with infinite horizon. This K-spin Hamiltonian furnishes a starting point from which to solve for an optimal policy using heuristic quantum algorithms such as adiabatic quantum annealing and the quantum approximate optimization algorithm on near-term quantum hardware. In arguing that the variational minimization of our Hamiltonian is approximately equivalent to the Bellman optimality condition for a prevalent class of environments we establish an interesting analogy with classical field theory. Along with proof-of-concept calculations to corroborate our formulation by simulated and quantum annealing against classical Q-Learning, we analyze the scaling of physical resources required to solve our Hamiltonian on quantum hardware.
Original language | American English |
---|---|
Article number | 12 |
Number of pages | 9 |
Journal | Quantum Machine Intelligence |
Volume | 2 |
Issue number | 2 |
DOIs | |
State | Published - 2020 |
Bibliographical note
Publisher Copyright:© 2020, Springer Nature Switzerland AG.
NREL Publication Number
- NREL/JA-2C00-83527
Keywords
- K-spin hamiltonian
- Markov decision process
- Quantum annealing
- Quantum approximate optimization algorithm
- Quantum optimization
- Simulated annealing