Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Policy Evaluation with Temporal Differences

Add to your list(s) Download to your calendar using vCal

Christoph Dann (Technische Universität Darmstadt)
Friday 28 March 2014, 11:00-12:00
Engineering Department, CBL Room BE-438.

If you have a question about this talk, please contact Zoubin Ghahramani.

Value functions play an essential role in many reinforcement learning approaches. Research on policy evaluation, the problem of estimating the value function from samples, has been dominated since the late 1980s by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability in off-policy estimation have only been tackled recently, which has led to a large number of new approaches.

I first present a short overview of TD methods from a unifying optimization perspective and the results of my experimental comparison highlighting the strengths and weaknesses of each approach. Furthermore, I show a novel variant of the least-squares TD learning (LSTD) algorithm for off-policy estimation that outperforms all previous approaches.

Most TD methods rely on a linear parametrization of the value function with a concise set of features which limits their use on large-scale problems. In the final part of the presentation, I introduce my recent work on the incremental feature dependency discovery (iFDD) algorithm. This approach efficiently handles large-scale problems with discrete state-spaces by automatically constructing features during estimation.

This talk is part of the Machine Learning @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Policy Evaluation with Temporal Differences

This talk is included in these lists:

Other lists

Other talks