COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Machine Learning @ CUED > Variance in Policy Gradient methods and Learning Sequential Latent Variable Models
Variance in Policy Gradient methods and Learning Sequential Latent Variable ModelsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact . I will discuss two efforts to improve learning in RL. In the first part, I’ll talk about our work towards understanding variance in policy gradient estimators. PPO and TRPO provide strong performance at the cost of requiring many on-policy samples, which makes them challenging to use in real-world applications. The high sample requirement arises from high variance gradient estimates. We explore where this variance comes from, and how we can reduce it. Switching gears, in the second part, I’ll talk about learning models of the world, which can simplify control by lifting the problem to a lower dimensional embedding space. Three groups independently introduced the idea of using a particle filter to train highly flexible non-linear sequential latent variable models. A key deficiency with this work is that the training procedure cannot properly account for temporal dependencies in the data because it uses the filtering distributions. We introduce learned tilting functions, which allow us to control the target distributions sequential Monte Carlos passes through. In principle, we can train everything jointly with a coherent objective. I’ll discuss preliminary results and challenges that we have yet to resolve. Bio: George Tucker is a researcher on the Google Brain team focusing on reinforcement learning and sequence models. He received his PhD from MIT in Mathematics and previously worked as researcher at Amazon in the speech group. This talk is part of the Machine Learning @ CUED series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsSidgwick Site Equalities Improvement Network Race Equality King's Graduate SeminarOther talksSeminar – Can we promote physical activity at the population level? Findings from a community-based cluster randomised trial and a sport fandom-based app study Monitoring and Fault-tolerant Control for Large-Scale Interconnected Systems Evidence that sub-cellular oscillators may time and execute organelle biogenesis SIGCOMM 2018 Trip Report Representation, optimization and generalization properties of deep neural networks Zone 6 Convention |