COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > NLIP Seminar Series > Rethinking the role of tokenization in the NLP pipeline
Rethinking the role of tokenization in the NLP pipelineAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Michael Schlichtkrull. Abstract: Tokenization is an integral part of the modern NLP pipeline, yet it is often treated as a black box without regard for the design choices that must be made when choosing a tokenizer. I will give an overview of how the two currently dominant tokenization algorithms work, and discuss their limitations from both a computational and a typological perspective. I will then talk about my recent EMNLP paper, which suggests using multiple tokenizations from the tokenizer to overcome the limitations of taking a single tokenization. Finally, I will discuss some ongoing work which uses character-based tokenization for masked language modelling, and examines which modelling architectures work well in this setting. Bio: Kris is a senior research scientist in the Language team at DeepMind. His research interests are at the intersection of linguistics, NLP and machine learning, and he is primarily focused on problems of unsupervised structure induction from language. He received his PhD from the University of Cambridge, where he worked on deep generative models for text generation. This talk is part of the NLIP Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listslife Centre for Smart Infrastructure & Construction SeminarsOther talksThe Isolation of Asylum Seekers: immigration detention in Australia Motile bacteria in giant unilamellar vesicles: A cautionary tale Linear Logic and the Semantics of Concurrent Computation Can we use network analysis to predict violence? Dynamics and mechanics of cell shape changes during cellular state changes Stokes constants in topological string theory |