| COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. | ![]() |
University of Cambridge > Talks.cam > Isaac Newton Institute Seminar Series > Quantized Q-Learning for Stochastic Control with Borel Spaces and General Information Structures
Quantized Q-Learning for Stochastic Control with Borel Spaces and General Information StructuresAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact nobody. SCLW01 - Bridging Stochastic Control And Reinforcement Learning: Theories and Applications Reinforcement learning algorithms often require finiteness of state and action spaces in Markov decision processes. In this presentation, we show that under mild regularity conditions (in particular, involving only weak continuity or Wasserstein continuity of the transition kernel of an MDP ), Qlearning for standard Borel MDPs via quantization of states and actions (called Quantized Q-Learning) converges to a limit under mild ergodicity conditions, and furthermore this limit satisfies an optimality equation which leads to near optimality with either explicit performance bounds or which are guaranteed to be asymptotically optimal. Our approach builds on (i) near-optimality of finite state model approximations for MDPs with weakly continuous kernels, and (ii) convergence of quantized Q-learning to a limit which corresponds to the fixed point of a constructed approximate finite MDP which depends on the exploration policy used during learning. This result also implies near optimality of empirical model learning where one fits a finite MDP model to data as an alternative to quantized Q-learning, for which we also obtain sample complexity bounds. Thus, we present a general rigorous convergence and near optimality result for the applicability of Q-learning and model learning for continuous MDPs. Our analysis applies also to problems with non-compact state spaces via non-uniform quantization with convergence bounds, to non-Markovian stochastic control problems which can be lifted to measure-valued MDPs under appropriate topologies (as in POMD Ps and decentralized stochastic control), and controlled diffusions via time-discretization. [Joint work with Ali Kara, Emre Demirci, Omar Mrani-Zentar, Naci Saldi, and Somnath Pradhan] This talk is part of the Isaac Newton Institute Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsExtraordinary Category Theory Seminar Are you interested in the Pharmaceutical industry as a profession after graduation? Combined External Astrophysics Talks DAMTPOther talksComputational Biology: Seminar Series - Professor Mark Isalan Building Reproducible Machine Learning Pipelines for inference of Galaxy Properties at Scale Talk by LLRRLLRR TBC Are They Coming Home? Transient Worker–Iñupiat Relations on Alaska's North Slope History of mathematics for mathmos 14 |