A combination of compositional and distributional/distributed representations has many potential advantages for computational semantics. From the distributed side: robustness, learnability from data, ease of handling ambiguity, and the ability to represent gradations of meaning. From the compositional side: the ability to handle the unbounded nature of natural language, and the existence of established accounts of semantic phenomena such as logical words, quantification and inference. The development of such a combination has many challenges.

The first half of the talk will describe a complete mathematical framework for deriving distributed representations compositionally using Combinatory Categorial Grammar (CCG). The tensor-based framework extends that of Coecke et al., which was previously applied to pregroup grammars, and is based on the observation that tensors are functions (multi-linear maps) and hence can be manipulated by the combinators of CCG , including type-raising and composition. The existence of robust, broad-coverage CCG parsers opens up the possibility of applying the tensor-based framework to naturally occurring text.

The second half will describe our ongoing efforts to implement the framework, for which there are considerable practical challenges. I will describe some of the sentence spaces we are exploring; some of the datasets we are developing; and some of the machine learning techniques we are using in an attempt to learn the values of the tensors from corpus data.

This work is being carried out with Luana Fagarasan, Douwe Kiela, Jean Maillard, Tamara Polajnar, Laura Rimell, Eva Maria Vecchi, and involves collaborations with Mehrnoosh Sadrzadeh (Queen Mary) and Ed Grefenstette and Bob Coecke (Oxford).

