Talks.cam will close on 1 July 2026, further information is available on the UIS Help Site
 

University of Cambridge > Talks.cam > Probabilistic Systems, Information, and Inference Group Seminars > Statistical Investigations into the Unseen: Missing Mass for Markov Samples and Natural Distribution Estimation

Statistical Investigations into the Unseen: Missing Mass for Markov Samples and Natural Distribution Estimation

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Prof. Ramji Venkataramanan.

Suppose we observe a sequence of samples from a very large alphabet and the number of samples is comparable or lesser than the alphabet size. Several letters from the alphabet will be unseen or missing in the observed samples. What can be inferred about the distribution’s probability mass on the missing letters? The sum of the probability masses on all missing letters is called missing mass, and the classical Good-Turing (GT) estimator is minimax optimal over all distributions and alphabet sizes when the samples are iid. However, when the samples are Markovian sequences, the GT estimator fails. In this talk, we will introduce a windowed version of the GT estimator and show that, when the window size is sufficiently larger than the mixing time, the windowed GT estimator is nearly minimax optimal. Going beyond missing mass, we will present the generalization to higher-order missing mass and missing g-mass, which can potentially quantify the distance of the missing part of the distribution from uniformity. We will conclude with some extensions of these results to the distribution’s probability mass on sparsely observed letters and potential impact on distribution estimation.

This talk is part of the Probabilistic Systems, Information, and Inference Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2026 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity