University of Cambridge > > Machine Learning Journal Club > Optimal Bayesian Reinforcement Learning on Trees

Optimal Bayesian Reinforcement Learning on Trees

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Philipp Hennig.

The “Q-Learning” algorithm is the classical solution to the so-called “Optimal” Reinforcement Learning Problem. Q-Learning uses samples of future rewards generated by a non-optimal policy to derive point estimates of the future rewards from the (unknown) optimal policy.

In the first part of this talk, I will show that a Bayesian treatment, in forcing us to explicitly define our assumptions, reveals some interesting aspects of this problem that seem to have been overlooked so far.

In the second part, I will introduce an algorithm that uses Expectation Propagation to generate beliefs over possible future rewards from the optimal policy if the Markov Environment forms a tree (i.e. “Bayesian Q-Learning on trees”) and will show some preliminary results for its application to Game Trees.

This talk is part of the Machine Learning Journal Club series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2023, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity