Optimal Bayesian Reinforcement Learning on Trees
Add to your list(s)
Download to your calendar using vCal
If you have a question about this talk, please contact Philipp Hennig.
The “Q-Learning” algorithm is the classical solution to the so-called “Optimal” Reinforcement Learning Problem. Q-Learning uses samples of future rewards generated by a non-optimal policy to derive point estimates of the future rewards from the (unknown) optimal policy.
In the first part of this talk, I will show that a Bayesian treatment, in forcing us to explicitly define our assumptions, reveals some interesting aspects of this problem that seem to have been overlooked so far.
In the second part, I will introduce an algorithm that uses Expectation Propagation to generate beliefs over possible future rewards from the optimal policy if the Markov Environment forms a tree (i.e. “Bayesian Q-Learning on trees”) and will show some preliminary results for its application to Game Trees.
This talk is part of the Machine Learning Journal Club series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
|