University of Cambridge > > NLIP Seminar Series > ‘Profit factory’ and ‘bathroom break’: How to analyse compounds and how to predict their emergence

‘Profit factory’ and ‘bathroom break’: How to analyse compounds and how to predict their emergence

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact James Thorne.

Compounds can be defined as the formation of a new lexeme by adjoining two or more lexemes (Bauer, 2003:40). They are studied extensively in linguistic literature and are enjoying more and more attention in the field of Natural Language Processing (NLP). Compounding is a very productive word formation process. English-speaking children can create novel compounds in spontaneous speech from a very young age (Clark, 1981). As a consequence, compounds are a very common word type, but many occur with a very low token count. The high productivity of compounds makes compositional approaches to automatic processing indispensable. Also, it raises questions about the processes that underlie the generation of novel compounds.

I will give an overview of recent work we undertook that harvests parallel corpora as indirect supervision for two tasks: compound identification, and bracketing of compounds. I will then discuss the potential of compounds as vehicles for creative thought and present some experiments that aim to predict novel compounds.


Bauer, L. 2003. Introducing Linguistic Morphology, 2nd edn., Washington, DC: Georgetown University Press.

Clark, E. V. (1981). Lexical innovations. How children learn to create new words. In W. Deutsch (Ed.), The child’s construction of language, London: Acad. Press.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity