University of Cambridge > Talks.cam > NLIP Seminar Series > Ontology Learning for Portuguese

Ontology Learning for Portuguese

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Laura Rimell.

Having in mind both the importance that semantic information plays nowadays in natural language processing, as well as the work involved in creating lexical resources from the scratch, this research aims the semi-automatic creation of a lexical ontology for Portuguese.

While, for English, WordNet [1] established as the standard model of a lexical ontology, for Portuguese, the few existing similar resources, created manually, are either on earlier stages of development or not publicly available for download and entire use. Therefore, as an alternative to manual creation and maintenance of such resources, the work proposed is concerned with the development of computational tools capable of extracting lexico-semantic knowledge from Portuguese textual resources. The knowledge acquired will then be structured into a public domain lexical ontology.

The extraction procedures will be based on the detection of textual patterns that are indicative of lexico-semantic relations between lexical items. Machine-readable dictionaries (MRDs) will be used as the primary source of knowledge, since they are already structured around words and their meanings, they typically use simple vocabulary, they were created by experts and they are the main source of general knowledge. The project PAPEL [2, 3] has shown the first steps considering the automatic extraction of semantic information from a general Portuguese MRD , using handcrafted semantic grammars. Therefore, the results and conclusions obtained in PAPEL will be used as a starting point. However, this research is also concerned with the exploration of other available Portuguese MRDs.

Moreover, this work will not be limited by processing dictionaries so, textual corpora will be used as the second source of knowledge, in order to enrich the the ontology in several more specific domains. Furthermore, the quality and utility of the resources developed will be assessed. Besides manual evaluation, and considering the time needed to perform the latter, automatic evaluation methodologies will be devised. In the end of this research, important contributions to Portuguese NLP are expected, such as a new public domain lexical resource and computational tools capable of learning lexico-semantic information from text.

[1] Christiane Fellbaum, editor (1998). WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press.

[2] Hugo Gonçalo Oliveira, Diana Santos, Paulo Gomes & Nuno Seco. “PAPEL: a dictionary-based lexical ontology for Portuguese”. In António Teixeira, Vera Lúcia Strube de Lima, Luís Caldas de Oliveira & Paulo Quaresma (eds.), Computational Processing of the Portuguese Language, 8th International Conference, Proceedings (PROPOR 2008) Vol. 5190, (Aveiro, Portugal, 2008), Springer Verlag, pp. 31-40

[3] Hugo Gonçalo Oliveira, Diana Santos & Paulo Gomes “Relations extracted from a Portuguese dictionary: results and first evaluation”. In Luís Seabra Lopes, Nuno Lau, Pedro Mariano & Luís Rocha (eds.) Local Proceedings of 14th Portuguese Conference on Artificial Intelligence (EPIA), Aveiro, Portugal, 2009.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2019 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity