COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > NLIP Seminar Series > Adapting a WSJ-trained Lexicalized-Grammar Parser to New Domains
Adapting a WSJ-trained Lexicalized-Grammar Parser to New DomainsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Johanna Geiss. In this talk I will describe some experiments on adapting the C&C CCG parser to new domains. The parser was originally developed using CCGbank, the CCG version of the Penn Treebank, and is therefore tuned to newspaper text. The two new domains we consider are (1) biomedical abstracts and (2) questions for a QA system (using the term “domain” somewhat loosely in the latter case). The porting approach we use is to train the parser at lower levels of representation than full syntactic derivations. The lexicalized nature of CCG (in which words are assigned syntactic categories that include subcategorization information) makes it possible to use a level of representation intermediate between POS tags and full derivations. For the biomedical data, we find that simply retraining the POS tagger leads to a large improvement in performance, and that using annotated data at the intermediate CCG lexical category level improves parsing accuracy further. A similar result is obtained for the question data, but the impact of retraining at the CCG lexical category level is much greater. We suggest that this is because the syntax of questions differs more from that of newspaper text than does the syntax of biomedical sentences, and we discuss some measures supporting this idea. The parsing accuracies obtained for both biomedical and question data are in the same range as those reported for newspaper text, and higher than those previously reported for the biomedical domain on the same evaluation resource. The conclusion is that porting newspaper-trained parsers to new domains may not be as difficult as first thought (at least for parsers which use lexicalized grammars), but we note that different levels of representation may have different impacts on the porting process, depending on the characteristics of the target domain. This talk is part of the NLIP Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsFrench Graduate Research Seminar (FGRS) Beyond Profit Think Tank Number Theory Study Group: Mazur-Tate-TeitelbaumOther talksCafé Synthetique: Graduate Talks! Immigration policy-making beyond 'Western liberal democracies' Breast cancer - demographics, presentation, diagnosis and patient pathway Neurological Problems The potential of the non-state sector:what can be learnt from the PEAS example Colorectal cancer. Part 1. Presentation, Diagnosis and Intervention. Part 2. Cellular signalling networks in colon cancer and the models to study them - a basic research perspective Protein Folding, Evolution and Interactions Symposium Singularities of Hermitian-Yang-Mills connections and the Harder-Narasimhan-Seshadri filtration Vision Journal Club: feedforward vs back in figure ground segmentation Knot Floer homology and algebraic methods Cambridge - Corporate Finance Theory Symposium September 2017 - Day 2 Not Maggie's fault? The Thatcher government and the reemergence of global finance |