COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Language Technology Lab Seminars > LLMs and Low-Resource Languages
LLMs and Low-Resource LanguagesAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Tiancheng Hu. Abstract: Generative AI models are now multilingual, raising new questions about their relative performance across languages and local cultures, specially for communities with less speakers. In this talk I will explore some of those questions and the lessons we learned along the process. Is it possible to build high-performing LLMs for low-resource languages? We have built a high performing open model for Basque accompanied by a fully reproducible end-to-end evaluation suite. Do LLMs think better in English than the local language? Our experiments show that LLMs do not fully exploit their multilingual potential when prompted in non-English languages. Do LLMs know about local culture? We probed the complex interaction between language and global/local knowledge, showing for the first time that local knowledge is transferred from the low-resource to the high-resource language, a sign that prior findings may not hold when evaluated on local topics. The evaluation suite was recognised with a best resource paper award at ACL 2024 . Bio: Eneko Agirre is Full Professor of Informatics and Head of HiTZ Basque Center of Language Technology at the University of the Basque Country, UPV /EHU, in San Sebastian, Spain. Visiting researcher or professor at New Mexico State, Melbourne, Southern California, Stanford and New York Universities. He has been active in Natural Language Processing and Computational Linguistics since his undergraduate days. He received the Spanish Informatics Research Award in 2021, and is one of the 74 fellows of the Association of Computational Linguistics (ACL). He was President of ACL ’s SIGLEX , member of the editorial board of Computational Linguistics, Journal of Artificial Intelligence Research and Action Editor for the Transactions of the ACL . He is co-founder of the Joint Conference on Lexical and Computational Semantics (*SEM). He is a recipient of three Google Research Awards and six best paper awards and nominations, most recent at ACL 2024 . Dissertations under his supervision received best PhD awards by EurAI, the Spanish NLP society and the Spanish Informatics Scientific Association. He has over 200 publications across a wide range of NLP and AI topics, as well as having given more than 20 invited talks, mostly international. This talk is part of the Language Technology Lab Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsThe Best White Hat SEO Link Building Tactics Type the title of a new list here Darwin Humanities and Social Sciences SeminarOther talksSpatial shuffling: the interchange process in dimension 5 Scientific perception, interpretation and prediction of the weather in late medieval England Predicting recurrence of prostate cancer: a Bayesian approach 21st Armitage Workshop and Lecture Echoes on the map: unveiling the auditory history of late Ottoman Istanbul through digital cartography TBA |