Stochastic optimization and adaptive learning rates
Add to your list(s)
Download to your calendar using vCal
If you have a question about this talk, please contact Yingzhen Li.
Stochastic optimization is prevalent in modern machine learning, and the main purpose of this talk is to understand why it works. We will first briefly recap the history of stochastic approximation methods, starting from the famous Robbins and Monro paper. Then we will introduce the cost function minimization problem in machine learning context and show you how to prove the convergence of stochastic gradient descent to a local optima. We proceed the proof in three steps: continuous gradient descent, discrete gradient descent, and stochastic gradient descent. However the conditions for learning rates presented in the proof is not necessary. So in the second part of the talk we will discuss popular adaptive learning rates, and in particular we will do a short tutorial on online learning, to give people intuitions on the regret bounds. Finally we will have a live demo session on comparing different learning rates.
This talk is part of the Machine Learning Reading Group @ CUED series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
|