University of Cambridge > Talks.cam > Language Technology Lab Seminars > LLMs and Low-Resource Languages

LLMs and Low-Resource Languages

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Tiancheng Hu.

Abstract: Generative AI models are now multilingual, raising new questions about their relative performance across languages and local cultures, specially for communities with less speakers. In this talk I will explore some of those questions and the lessons we learned along the process. Is it possible to build high-performing LLMs for low-resource languages? We have built a high performing open model for Basque accompanied by a fully reproducible end-to-end evaluation suite. Do LLMs think better in English than the local language? Our experiments show that LLMs do not fully exploit their multilingual potential when prompted in non-English languages. Do LLMs know about local culture? We probed the complex interaction between language and global/local knowledge, showing for the first time that local knowledge is transferred from the low-resource to the high-resource language, a sign that prior findings may not hold when evaluated on local topics. The evaluation suite was recognised with a best resource paper award at ACL 2024 .

Bio: Eneko Agirre is Full Professor of Informatics and Head of HiTZ Basque Center of Language Technology at the University of the Basque Country, UPV /EHU, in San Sebastian, Spain. Visiting researcher or professor at New Mexico State, Melbourne, Southern California, Stanford and New York Universities. He has been active in Natural Language Processing and Computational Linguistics since his undergraduate days. He received the Spanish Informatics Research Award in 2021, and is one of the 74 fellows of the Association of Computational Linguistics (ACL). He was President of ACL ’s SIGLEX , member of the editorial board of Computational Linguistics, Journal of Artificial Intelligence Research and Action Editor for the Transactions of the ACL . He is co-founder of the Joint Conference on Lexical and Computational Semantics (*SEM). He is a recipient of three Google Research Awards and six best paper awards and nominations, most recent at ACL 2024 . Dissertations under his supervision received best PhD awards by EurAI, the Spanish NLP society and the Spanish Informatics Scientific Association. He has over 200 publications across a wide range of NLP and AI topics, as well as having given more than 20 invited talks, mostly international.

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity