Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Scalable Non-Markovian Language Modelling

Add to your list(s) Download to your calendar using vCal

Ehsan Shareghi
Thursday 03 May 2018, 11:00-12:00
Boardroom, Faculty of English, West Road.

If you have a question about this talk, please contact Dimitri Kartsaklis.

Markov models are popular means of modeling the underlying structure of natural language, which is naturally represented as sequences and trees. The locality assumption made in low-order Markov models such as n-gram language models is limiting, because if the data generation process exhibits long range dependencies, modeling the distribution well requires consideration of long range context. On the other hand, higher-order Markov, or infinite-order Non-Markovian (infinite-order Markov) models, exhibit computational complexity and statistical challenges during learning and inference. In particular, under the large data setting their exponential number of parameters often results in estimation and sampler mixing issues, while representing the structure of the model, and sufficient statistics or sampler states can quickly become computationally inefficient and impractical.

We propose a framework based on compressed data structures which keeps the memory usage of modeling, learning, and inference steps independent from the order of the models. Our approach scales nicely with the order of the Markov model and data size, and is highly competitive with the state-of-the-art in terms of the memory and runtime, while allowing us to develop Bayesian and non-Bayesian smoothing techniques. Using our compressed framework to represent the models, we explore its scalability under two Non-Markovian language modeling settings, using large scale data and infinite context.

First, we model the Kneser-Ney family of language models and illustrate that our approach is several orders of magnitude more memory efficient than the state-of-the-art, in training and testing, while it is highly competitive in terms of run-times of both phases. When memory is a limiting factor at query time, our approach is orders of magnitude faster than the state-of-the-art. We then turn to Hierarchical Nonparametric Bayesian language modeling, and develop efficient sampling mechanism which allows us to prevent the sampler mixing issue, common in large Bayesian models. More precisely, compared with the previous stat-of-the-art hierarchical Bayesian language model, the experimental results illustrate that our model can be built on 100x larger datasets, while being several orders of magnitude smaller, fast for training and inference, and outperforming the perplexity of the state-of-the-art Modified Kneser-Ney LM by up to 15%.

This talk is part of the Language Technology Lab Seminars series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Scalable Non-Markovian Language Modelling

This talk is included in these lists:

Other lists

Other talks