University of Cambridge > > Statistics > Regret Bounds for Gaussian Process Bandit Problems

Regret Bounds for Gaussian Process Bandit Problems

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Richard Nickl.

Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. I address the scenario in which the reward distribution for arms is modeled by a Gaussian process and there is no noise in the observed reward. My main result is a bound on the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. One of the key tools to derive the bounds is the Dudley integral which allows to bound the expected supremum of a Gaussian Process. The upper bounds are complemented with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in the upper bounds. The bounds are in a sense quite crude as they are based on an algorithm that samples the whole space uniformly for the maximum. A better approach would be to consider an algorithm that pays more attention to “high reward plateaus” and study its regret. I will discuss this approach and the technical problems associated with it in an outlook.

This talk is part of the Statistics series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity