COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |

University of Cambridge > Talks.cam > Signal Processing and Communications Lab Seminars > Simple Reinforcement Learning Algorithms for Continuous State and Action Space Systems

## Simple Reinforcement Learning Algorithms for Continuous State and Action Space SystemsAdd to your list(s) Download to your calendar using vCal - Prof. Rahul Jain, University of Southern California
- Monday 17 June 2019, 12:00-13:00
- LR3A, Inglis Building, CUED.
If you have a question about this talk, please contact Prof. Ramji Venkataramanan. Reinforcement Learning (RL) problems for continuous state and action space systems are quite challenging. Recently, deep reinforcement learning methods have been shown to be quite effective for certain RL problems in settings of very large/continuous state and action spaces. But such methods require extensive hyper-parameter tuning, huge amount of data, and come with no performance guarantees. We note that such methods are mostly trained `offline’ on experience replay buffers. In this talk, I will describe a series of simple reinforcement learning schemes for various settings. Our premise is that we have access to a generative model that can give us simulated samples of the next state. We will start with finite state and action space MDPs. An `empirical value learning’ (EVL) algorithm can be derived quite simply by replacing the expectation in the Bellman operator with an empirical estimate. We note that the EVL algorithm has remarkably good numerical performance for practical purposes. We next extend this to continuous state spaces by considering randomized function approximation on a reproducible kernel Hilbert space (RKHS). This allows for arbitrarily good approximation with high probability for any problem due to its universal function approximation property. Next, we consider continuous action spaces. In each iteration of EVL , we sample actions from the continuous action space, and take a supremum over the sampled actions. Under mild assumptions on the MDP , we show that this performs quite well numerically, with provable performance guarantees. Finally, we consider the `Online-EVL’ algorithm that learns from a trajectory of state-action-reward sequence. Under mild mixing conditions on the trajectory, we can provide performance bounds and also show that it has competitive (and in fact marginally better) performance as compared to the Deep Q-Network algorithm on a benchmark RL problem. I will conclude by a brief overview of the framework of probabilistic contraction analysis of iterated random operators that underpins the theoretical analysis. This talk is based on work with a number of people including Vivek Borkar (IIT Bombay), Peter Glynn (Stanford), Abhishek Gupta (Ohio State), William Haskell (Purdue), Dileep Kalathil (Texas A&M), and Hiteshi Sharma (USC). This talk is part of the Signal Processing and Communications Lab Seminars series. ## This talk is included in these lists:- All Talks (aka the CURE list)
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge University Engineering Department Talks
- Cambridge talks
- Centre for Smart Infrastructure & Construction
- Chris Davis' list
- Computational Continuum Mechanics Group Seminars
- Featured lists
- Information Engineering Division seminar list
- Interested Talks
- LR3A, Inglis Building, CUED
- School of Technology
- Signal Processing and Communications Lab Seminars
- Trust & Technology Initiative - interesting events
- bld31
- ndk22's list
- ob366-ai4er
- rp587
Note that ex-directory lists are not shown. |
## Other listsStatistics of Prof Philip Dawid Place of an Intellectual## Other talksHeavens and Earth: An Empirical Approach to Knowledge Across Cultures– gloknos Annual Lecture Series Caring for ancient Egyptian coffins in Cairo: A new collaboration for the future Addressing health: sickness and retirement in the Victorian Post Office Insect conservation: conserving the little things that run the world. |