COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Language Technology Lab Seminars > Scalable Non-Markovian Language Modelling
Scalable Non-Markovian Language ModellingAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Dimitri Kartsaklis. Markov models are popular means of modeling the underlying structure of natural language, which is naturally represented as sequences and trees. The locality assumption made in low-order Markov models such as n-gram language models is limiting, because if the data generation process exhibits long range dependencies, modeling the distribution well requires consideration of long range context. On the other hand, higher-order Markov, or infinite-order Non-Markovian (infinite-order Markov) models, exhibit computational complexity and statistical challenges during learning and inference. In particular, under the large data setting their exponential number of parameters often results in estimation and sampler mixing issues, while representing the structure of the model, and sufficient statistics or sampler states can quickly become computationally inefficient and impractical. We propose a framework based on compressed data structures which keeps the memory usage of modeling, learning, and inference steps independent from the order of the models. Our approach scales nicely with the order of the Markov model and data size, and is highly competitive with the state-of-the-art in terms of the memory and runtime, while allowing us to develop Bayesian and non-Bayesian smoothing techniques. Using our compressed framework to represent the models, we explore its scalability under two Non-Markovian language modeling settings, using large scale data and infinite context. First, we model the Kneser-Ney family of language models and illustrate that our approach is several orders of magnitude more memory efficient than the state-of-the-art, in training and testing, while it is highly competitive in terms of run-times of both phases. When memory is a limiting factor at query time, our approach is orders of magnitude faster than the state-of-the-art. We then turn to Hierarchical Nonparametric Bayesian language modeling, and develop efficient sampling mechanism which allows us to prevent the sampler mixing issue, common in large Bayesian models. More precisely, compared with the previous stat-of-the-art hierarchical Bayesian language model, the experimental results illustrate that our model can be built on 100x larger datasets, while being several orders of magnitude smaller, fast for training and inference, and outperforming the perplexity of the state-of-the-art Modified Kneser-Ney LM by up to 15%. This talk is part of the Language Technology Lab Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsCambridge Seminars in the History of Cartography ChemSoc - Cambridge Chemistry Society Early Science and MedicineOther talkstbc (Image analysis for cancer treatment) Perutz Lecture- How synapses work: Architecture and mechanism of neurotransmitter receptors and transporters Can you live to see a better day? Lifestyle engagement predicts healthy cognitive development in old age (a Cam-CAN study) Introduction EDUCATIONAL AIMS AND VALUES THROUGH ARCHITECTURE: Revisiting radical pasts to make space for broader visions of education Efficient derivatives pricing before Black, Scholes and Merton: evidence from the interwar London Metals Exchange |