# The EM algorithm and applications

An expectation-maximization (EM) algorithm is used in statistics for finding maximum likelihood estimates of parameters in probabilistic models, where the model depends on unobserved latent variables. EM alternates between performing an expectation (E) step, which computes an expectation of the likelihood by including the latent variables as if they were observed, and a maximization (M) step, which computes the maximum likelihood estimates of the parameters by maximizing the expected likelihood found on the E step. The parameters found on the M step are then used to begin another E step, and the process is repeated.

The EM algorithm was explained and given its name in a classic 1977 paper by Arthur Dempster, Nan Laird, and Donald Rubin in the Journal of the Royal Statistical Society (see link below). They pointed out that the method had been “proposed many times in special circumstances” by other authors, but the 1977 paper generalized the method and developed the theory behind it.

EM is frequently used for data clustering in machine learning and computer vision. In natural language processing, two prominent instances of the algorithm are the Baum-Welch algorithm (also known as forward-backward) and the inside-outside algorithm for unsupervised induction of probabilistic context-free grammars. In psychometrics, EM is almost indispensable for estimating item parameters and latent abilities of item response theory models. With the ability to deal with missing data and observe unidentified variables, EM is becoming a useful tool to price and manage risk of a portfolio. The EM algorithm is also widely used in medical image reconstruction, especially in Positron Emission Tomography and Single Photon Emission Computed Tomography. See below for other faster variants of EM.

We will go through the algorithm in general, prove an important convergence property, comment on historical context, illustrate on a famous application to clustering, and talk about extensions including MCEM and ECM which can be used with the E-step and M-step, respectively, are not analytically tractable.

http://www.jstor.org/stable/2984875

This talk is part of the Statistics Reading Group series.