University of Cambridge > Talks.cam > NLIP Seminar Series > Language Modelling with Phonemes

Language Modelling with Phonemes

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Suchir Salhan.

The statistical properties of language and how they may be used in language processing and language acquisition have been studied for many decades. Recently, large language models have demonstrated striking language-learning capabilities, providing evidence for the “richness” of the linguistic stimulus, but are often trained on data that seems cognitively implausible both in terms of quantity (thousands of human-lifetimes) and quality (written text, internet sources). For these models to help us study language, we must think far more carefully about the plausibility of the input – using phonemes instead of letters, using spoken sources, and reducing the quantity. We must then determine whether the architectures we use are suitable at this scale and input representation. These models can then give us valuable analytical insights about the statistical properties of language and the learnability of language, as well as giving us practical benefits for tasks associated with language modelling and language understanding.

Speaker Biography

Zebulon Goriely is a fourth-year PhD student working on Transformer Language Models and Child Language Acquisition, supervised by Professor Paula Buttery.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity