Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Scalable Structural Inductive Biases in Neural Language Models

Add to your list(s) Download to your calendar using vCal

Adhiguna Kuncoro, DeepMind
Thursday 25 February 2021, 11:00-12:00
https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBdXVpOXFvdz09.

If you have a question about this talk, please contact Marinela Parovic.

Scalable language models like BERT and GPT -3 have achieved remarkable success in various natural language understanding benchmarks, including on challenging benchmarks of structural competence. Does this success mean that data scale and large models are all we need to fully comprehend natural language? Or can these scalable models instead still benefit from more explicit structural inductive biases?

This talk provides evidence for the latter: We improve the performance of LSTM and Transformer models by augmenting them with structural inductive biases derived from an explicitly hierarchical—-albeit harder to scale—-recurrent neural network grammars (RNNG). I will begin with an overview of the proposed structure distillation objective for autoregressive language modelling with LST Ms. I will then discuss an extension to the masked language modelling case, by distilling the approximate posterior distributions of the RNNG teacher, which culminates in structure-distilled BERT models that outperform the standard BERT model on a diverse suite of structured prediction tasks.

Altogether, these findings demonstrate the benefits of syntactic biases, even in scalable language models that learn from large amounts of data, and contribute to a better understanding of where syntactic biases are most helpful in benchmarks of natural language understanding.

This talk is part of the Language Technology Lab Seminars series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Scalable Structural Inductive Biases in Neural Language Models

This talk is included in these lists:

Other lists

Other talks