Policy Evaluation with Temporal Differences
Add to your list(s)
Download to your calendar using vCal
If you have a question about this talk, please contact Zoubin Ghahramani.
Value functions play an essential role in many reinforcement learning approaches. Research on policy evaluation, the problem of estimating the value function from samples, has been dominated since the late 1980s by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability in off-policy estimation have only been tackled recently, which has led to a large number of new approaches.
I first present a short overview of TD methods from a unifying optimization perspective and the results of my experimental comparison highlighting the strengths and weaknesses of each approach. Furthermore, I show a novel variant of the least-squares TD learning (LSTD) algorithm for off-policy estimation that outperforms all previous approaches.
Most TD methods rely on a linear parametrization of the value function with a concise set of features which limits their use on large-scale problems. In the final part of the presentation, I introduce my recent work on the incremental feature dependency discovery (iFDD) algorithm. This approach efficiently handles large-scale problems with discrete state-spaces by automatically constructing features during estimation.
This talk is part of the Machine Learning @ CUED series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
|