| COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. | ![]() |
University of Cambridge > Talks.cam > Isaac Newton Institute Seminar Series > New Relative Value Iteration and Q-Learning Algorithms for Ergodic Risk Sensitive Control of Markov Chains
New Relative Value Iteration and Q-Learning Algorithms for Ergodic Risk Sensitive Control of Markov ChainsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact nobody. SCLW01 - Bridging Stochastic Control And Reinforcement Learning: Theories and Applications In this talk, we will present new Jacobi-like relative value iteration (RVI) algorithms for the ergodic risk-sensitive control problem of discrete-time Markov chains, and the associated Q-learning algorithms. In the case of finite state space, we prove the iterates of the new RVI algorithms converge geometrically, and in the case of countable state space, we prove the convergence of the appropriately truncated problem. We employ the entropy variational formula in order to tackle the multiplicative nature of the risk-sensitive Bellman operator, albeit with an additional optimization problem over a corresponding set of probability vectors. We then discuss the entropy-based risk-sensitive Q-learning algorithms corresponding to the existing and new Jacobi-like RVI algorithms. These Q-learning algorithms have two coupled components: the usual Q-function iterates and the new probability iterates arising from the entropy-variational formula. We prove the convergence of the coupled iterates by investigating the multi-scale stochastic approximations for these iterates. This talk is part of the Isaac Newton Institute Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsCELS lunchtime seminars hc446 Cambridge UCUOther talksProtocols over Implementations A fishy study: from ecology to cognition to brains (Adrian Christmas lecture) Grasping the invisible: Multidimensional meanings for abstract concepts Interactive Coupling of Hydrogen and Methane Lost software, Polar Research Ships, Antarctica, and Doom: Rescuing 30+ years of raw scientific vessel underway data The world is on fire—now what? |