University of Cambridge > Talks.cam > NLIP Seminar Series > Semi-supervised Training of a Statistical Parser from Unlabeled Partially-bracketed Data

Semi-supervised Training of a Statistical Parser from Unlabeled Partially-bracketed Data

Add to your list(s) Download to your calendar using vCal

  • UserJohn Carroll - Department of Informatics, University of Sussex
  • ClockFriday 15 June 2007, 15:00-16:00
  • HouseSW01 Computer Laboratory.

If you have a question about this talk, please contact NLIP Seminars.

We compare the accuracy of a statistical parse ranking model trained from a fully-annotated portion of the Susanne treebank with one trained from unlabeled partially-bracketed sentences derived from this treebank and from the Penn Treebank. We demonstrate that confidence-based semi-supervised techniques similar to self-training outperform expectation maximization when both are constrained by partial bracketing. Both methods based on partially-bracketed training data outperform the fully supervised technique, and both can, in principle, be applied to any statistical parser whose output is consistent with such partial-bracketing. We also explore tuning the model to a different domain and the effect of in-domain data in the semi-supervised training processes.

(This is joint work with Rebecca Watson and Ted Briscoe.)

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity