Talks.cam will close on 1 July 2026, further information is available on the UIS Help Site
 

University of Cambridge > Talks.cam > Isaac Newton Institute Seminar Series > New Relative Value Iteration and Q-Learning Algorithms for Ergodic Risk Sensitive Control of Markov Chains

New Relative Value Iteration and Q-Learning Algorithms for Ergodic Risk Sensitive Control of Markov Chains

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact nobody.

SCLW01 - Bridging Stochastic Control And Reinforcement Learning: Theories and Applications

In this talk, we will present new Jacobi-like relative value iteration (RVI) algorithms for the ergodic risk-sensitive control problem of discrete-time Markov chains, and the associated Q-learning algorithms. In the case of finite state space, we prove the iterates of the new RVI algorithms converge geometrically, and in the case of countable state space, we prove the convergence of the appropriately truncated problem. We employ the entropy variational formula in order to tackle the multiplicative nature of the risk-sensitive Bellman operator, albeit with an additional optimization problem over a corresponding set of probability vectors. We then discuss the entropy-based risk-sensitive Q-learning algorithms corresponding to the existing and new Jacobi-like RVI algorithms. These Q-learning algorithms have two coupled components: the usual Q-function iterates and the new probability iterates arising from the entropy-variational formula. We prove the convergence of the coupled iterates by investigating the multi-scale stochastic approximations for these iterates. 

This talk is part of the Isaac Newton Institute Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity