How deep can LLMs reason? Benchmarking GPT and Llama models on nested mathematical formulas
- đ¤ Speaker: Flavio Petruzzellis, Universita' di Padova
- đ Date & Time: Friday 07 June 2024, 17:00 - 18:00
- đ Venue: Lecture Theatre 2, Computer Laboratory, William Gates Building
Abstract
While LLMs have shown proficiency in a diverse range of language-related tasks, their performance in mathematical problem-solving remains an area of exploration. Through empirical evidence, I challenge the prevailing notion that scaling LLMs and employing advanced prompting strategies inherently enhance their ability to solve complex mathematical formulas. The findings I will present indicate that while LLMs can handle certain mathematical tasks effectively, they exhibit limitations when faced with nested formulas which require to reason systematically to be solved, for example by applying simple algorithms. I will present results from both GPT and Llama models, highlighting the similarities and differences in their performance. Additionally, I will discuss potential underlying factors, consider the remaining limitations of this work, and present promising avenues for future research to enhance LLMs’ mathematical problem-solving capabilities.
Series This talk is part of the Foundation AI series.
Included in Lists
- All Talks (aka the CURE list)
- Artificial Intelligence Research Group Talks (Computer Laboratory)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Department of Computer Science and Technology talks and seminars
- Guy Emerson's list
- Hanchen DaDaDash
- Interested Talks
- Lecture Theatre 2, Computer Laboratory, William Gates Building
- Martin's interesting talks
- ndk22's list
- ob366-ai4er
- PhD related
- rp587
- School of Technology
- Speech Seminars
- Trust & Technology Initiative - interesting events
- yk373's list
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Friday 07 June 2024, 17:00-18:00