# Optimal prediction of Markov chains without mixing conditions

Motivated by practical applications such as autocomplete and text generation, we study the following statistical problem with dependent data: Observing a trajectory of length $n$ from a stationary first-order Markov chain with $k$ states, how to predict (the distribution of) the next state? In contrast to the better-studied parameter estimation problem which relies on assumptions on the mixing time or the minimal probability of the stationary distribution, the prediction problem requires neither. This allows for an assumption-free framework but also results in new technical challenges due to the lack of concentration for sample path statistics.

For $3 \leq k \leq O(\sqrt{n})$, using information-theoretic techniques including, in particular, those rooted in universal compression, we show that the optimal rate of Kullback-Leibler prediction risk is $\frac{k2}{n}\log \frac{n}{k2}$, in contrast to the optimal rate of $\frac{\log \log n}{n}$ for $k=2$ previously shown by Falahatgar et al. These rates, slower than the parametric rate of $O(\frac{k^2}{n})$, can be attributed to the memory in the data. To quantify the memory effect, we give a lower bound on the spectral gap that ensures the prediction risk is parametric. Extensions to higher-order Markov chains and Hidden Markov Model will be discussed briefly.

This is based on joint work with Yanjun Han and Soham Jana: https://arxiv.org/abs/2106.13947

This talk is part of the Statistics series.