University of Cambridge > Talks.cam > Language Technology Lab Seminars > Scalable Structural Inductive Biases in Neural Language Models

Scalable Structural Inductive Biases in Neural Language Models

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Marinela Parovic.

Scalable language models like BERT and GPT -3 have achieved remarkable success in various natural language understanding benchmarks, including on challenging benchmarks of structural competence. Does this success mean that data scale and large models are all we need to fully comprehend natural language? Or can these scalable models instead still benefit from more explicit structural inductive biases?

This talk provides evidence for the latter: We improve the performance of LSTM and Transformer models by augmenting them with structural inductive biases derived from an explicitly hierarchical—-albeit harder to scale—-recurrent neural network grammars (RNNG). I will begin with an overview of the proposed structure distillation objective for autoregressive language modelling with LST Ms. I will then discuss an extension to the masked language modelling case, by distilling the approximate posterior distributions of the RNNG teacher, which culminates in structure-distilled BERT models that outperform the standard BERT model on a diverse suite of structured prediction tasks.

Altogether, these findings demonstrate the benefits of syntactic biases, even in scalable language models that learn from large amounts of data, and contribute to a better understanding of where syntactic biases are most helpful in benchmarks of natural language understanding.

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2021 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity