University of Cambridge > Talks.cam > NLIP Seminar Series > Learnable representations for natural language

Learnable representations for natural language

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Laura Rimell.

The Chomsky hierarchy was explicitly intended to represent the hypotheses output by distributional learning algorithms; yet these standard representations are well known to be hard to learn in an unsupervised fashion, even under quite benign learning paradigms, because of the computationally complexity of inferring rich hidden structures like trees. Nonetheless there is much interest in NLP on unsupervised learning of syntax: trying to infer gold standard trees from unannotated data. But this approach is misguided—we don’t know what the right representations are but we do know they are learnable, since children do in fact acquire such representations. Accordingly, we explore a different approach—building representations or grammars that are intrinsically learnable.

This research direction involves abandoning the standard models and designing new representation classes for formal languages that are richly structured but where the structure is not hidden but based on observable structures of the language. We illustrate this approach by looking at algorithms for learning regular languages using deterministic automata, and then move on to algorithms for learning context free and context sensitive languages. The largest and most powerful class, based on the theory of residuated lattices, may be rich enough to represent natural language syntax; these grammars are cubic time parseable and are efficiently learnable. The class of languages defined by these representations contains all regular languages, many but not all context free languages, and some context sensitive languages; it thus seems a plausible candidate for the class of possible natural languages.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity