University of Cambridge > > Statistics > The information complexity of sequential resource allocation

The information complexity of sequential resource allocation

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Quentin Berthet.

This talk will be about sequential resource allocation, under the so-called stochastic multi-armed bandit model. In this model, an agent interacts with a set of (unknown) probability distributions, called ‘arms’ (in reference to ‘one-armed bandits’, another name for slot machines in a casino). When the agent draws an arm, he observes a sample from the associated distribution. This sample can be seen as a reward, and the agent then aims at maximizing the sum of his rewards during the interaction. This ‘regret minimization’ objective makes sense in many practical applications, starting with medical trials, that motivated the introduction of bandit problems in the 1930’s. Another possible objective for the agent, called best-arm identification, is to discover as fast as possible the best arm(s), that is the arms whose distributions have highest mean, but without suffering a loss when drawing ‘bad’ arms.

For each of these objectives, our goal will be to define a distribution-dependent notion of optimality, thanks to lower bounds on the performance of good strategies, and to propose algorithms that can be qualified as optimal according to these lower bounds. For some classes of parametric bandit models, this permits to characterize the complexity of regret minimization and best-arm identification in terms of (different) information-theoretic quantities.

This talk is part of the Statistics series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity