University of Cambridge > Talks.cam > Machine Learning @ CUED > Mean Field Approaches to Two Sequential Decision Making Problems.

Mean Field Approaches to Two Sequential Decision Making Problems.

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact dsu21.

This talk will be given via Skype

In this presentation, I will present some of my past research on mean field approaches to sequential decision making problems under uncertainty. The talk has two parts.

In the first part, we’ll consider multiarmed bandit games, where much of the classical work focuses on arm rewards that are stationary over time. By contrast, I will present a bandit model with multiple agents where the rewards obtained by an agent can also depend on how many other agents choose the same arm (as might be the case in many competitive or cooperative scenarios). Such a dynamical system can be non-stationary due to the interdependent evolution of agents, which makes an analysis using typical equilibrium concepts intractable. I will introduce a general model of multiarmed bandit games, and define a notion of equilibrium inspired mean field methods, called the mean field stationary state (MFSS). I will discuss three key results— first, we establish existence of an MFSS under quite general conditions. Second, we show under a contraction condition that the MFE is unique, and that the population profile converges to it from any initial state. Finally, we show that under the contraction condition, MFE is a good approximation to the behavior of finite systems with many agents. The contraction condition requires that the agent population is sufficiently mixing and that the sensitivity of the reward function to the population profile is low enough.

The second part is motivated by dynamic ad auction markets (e.g Facebook, Google, Bing) where ad-slots are auctioned off dynamically at large scale. Using a mean field approximation, I will formulate the sequential optimization problem facing a bidder participating in such repeated auctions in the form of a very general stochastic control model, and present an approximate solution to a limiting MDP that allows for efficient computation of the optimal bids. I will also discuss how the optimal bid at any given time in a dynamic setting reduces to that of an equivalent static auction with a shaded private valuation, for a very general class of auction formats that includes the second price auction as a special case.

This is based on joint work with Ramesh Johari (Stanford), Peter Key (MSR Cambridge), Jia Yuan Yu (IBM Research) and Alexandre Proutiere (KTH Stockholm)

This talk is part of the Machine Learning @ CUED series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2020 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity