Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Offline Reinforcement Learning

Add to your list(s) Download to your calendar using vCal

Max Patacchiola (University of Cambridge), Stephen Chung (University of Cambridge), Adam Jelley (University of Edinburgh)
Wednesday 15 February 2023, 11:00-12:30
Cambridge University Engineering Department, CBL Seminar room BE4-38..

If you have a question about this talk, please contact James Allingham.

Zoom link available upon request (it is sent out on our mailing list, eng-mlg-rcc [at] lists.cam.ac.uk). Sign up to our mailing list for easier reminders.

In the first part of the talk we will introduce the common terms used in standard online RL. After that we will define the offline RL setting, describing applications and benchmarks. We will then focus on behavioural cloning (BC), as a simple and stable baseline for learning a policy from offline interaction data. As a particular instance of BC, we will describe the decision transformer, a recently proposed method that leverages the transformer architecture to tackle the offline RL setting. In the second part of the talk, we will explore how off-policy RL algorithms originally designed for the online setting (such as SAC ) can be adapted to better handle the necessary distribution shift required for improving on the policy in the offline data, without online feedback. We will find that this reduces to a problem of quantifying and managing uncertainty. In the third and last part of the talk, we will first review the classical offline reinforcement learning methods, including ways to evaluate and improve policies using offline data by importance sampling. The challenges and applicability of these methods will be discussed. Then, we will review modern offline RL methods, including policy constraint methods and model-based offline RL methods. In policy constraint methods, we encourage the new policy to be similar to the policy observed in the offline dataset, while in model-based offline RL methods, we quantify the uncertainty of the model and use the uncertainty to discourage the new policy from visiting those uncertain regions.

References:

Levine, S., Kumar, A., Tucker, G., & Fu, J. (2020). Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643.

Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., ... & Mordatch, I. (2021). Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34, 15084-15097.

Fujimoto, S., & Gu, S. S. (2021). A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34, 20132-20145.

An, G., Moon, S., Kim, J. H., & Song, H. O. (2021). Uncertainty-based offline reinforcement learning with diversified q-ensemble. Advances in neural information processing systems, 34, 7436-7447.

This talk is part of the Machine Learning Reading Group @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Offline Reinforcement Learning

This talk is included in these lists:

Other lists

Other talks