University of Cambridge > > Language Technology Lab Seminars > Building and using the Finnish Internet Parsebank

Building and using the Finnish Internet Parsebank

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Mohammad Taher Pilehvar.

The Finnish Internet Parsebank is a corpus of 270M Finnish sentences of Internet crawl data, syntactically analysed in the Universal Dependencies representation. I will present the parsebank, some of the lessons learned when crawling and analysing the data, the tools and derived resources we developed, and some of the uses the parsebank has seen. In particular, I will focus on the syntax query tools which can efficiently handle a corpus of over 4 billion tokens of syntactically analysed data. I will also mention some future directions aiming at a similar parsebank for the majority of the languages in Universal Dependencies.

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity