COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > NLIP Seminar Series > The Web as an Implicit Training Set: Application to Noun Compounds' Syntax and Semantics
The Web as an Implicit Training Set: Application to Noun Compounds' Syntax and SemanticsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Laura Rimell. I will present Web-based approaches to the syntax and semantics of noun compounds (NCs), which can be used in query parsing, technical term understanding, etc. I will also describe an application to machine translation. First, I will present a highly accurate lightly supervised method based on surface features and paraphrases for making bracketing decisions for three-word noun compounds, e.g. “[[liver cell] antibody]” is left-bracketed, while “[liver [cell line]]” is right-bracketed. The enormous size of the Web makes such features frequent enough to be useful. Second, I will introduce an unsupervised method for discovering the implicit predicates characterizing the semantic relations that hold in noun-noun compounds. For example, “malaria mosquito” is a “mosquito that carries/spreads/causes/transmits/brings/infects with/... malaria”. Finally, I will present a method for improving Machine Translation (SMT). Most modern SMT systems rely on aligned sentences of bilingual corpora for training. I will describe a method for expanding the training set with conceptually similar but syntactically differing paraphrases at the NP-level which involve NCs. The English to Spanish evaluation on the Europarl corpus shows an improvement equivalent to 33%-50% of that of doubling the amount of training data. This talk is part of the NLIP Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other liststalks Graduate Students at CUED (GSCUED) Events Tom Henn Memorial Lecture - St Catharine's Collect Lecture SeriesOther talksVision Journal Club: feedforward vs back in figure ground segmentation Dr Michael Hastings: Circadian Rhythms Making Refuge: Scripture and Refugee Relief Number, probability and community: the Duckworth-Lewis-Stern data model, Monte Carlo simulations and counterfactual futures in cricket Climate change, archaeology and tradition in an Alaskan Yup'ik Village Cambridge - Corporate Finance Theory Symposium September 2017 - Day 2 Market Socialism and Community Rating in Health Insurance 100 Problems around Scalar Curvature Genomic Approaches to Cancer Coinage in the later medieval countryside: single-finds and the evidence from Rendlesham, Suffolk |