University of Cambridge > Talks.cam > NLIP Seminar Series > Unsupervised Word Alignment and Part of Speech Induction with Undirected Models

Unsupervised Word Alignment and Part of Speech Induction with Undirected Models

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Thomas Lippincott.

This talk explores unsupervised learning in undirected graphical models for two problems in natural language processing. Undirected models can incorporate arbitrary, non-independent features computed over random variables, thereby overcoming the inherent limitation of directed models, which require that features factor according to the conditional independencies of an acyclic generative process. Using word alignment (finding lexical correspondences in parallel texts) and bilingual part-of-speech induction (jointly learning syntactic categories for two languages from parallel data) as case studies, we show that relaxing the acyclicity requirement lets us formulate more succinct models that make fewer counterintuitive independence assumptions. Experiments confirm that our undirected alignment model yields consistently better performance than directed model baselines, according to both intrinsic and extrinsic measures. With POS tagging, we find more tentative results. Analysis reveals that our parameter learner tends to get caught in shallow local optima corresponding to poor tagging solutions. Switching to an alternative learning objective (contrastive estimation; Smith and Eisner, 2005) improves the stability and performance, but it suggests that non-convex objectives may be a larger problem in undirected models than with directed models.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity