COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Foundation AI > How deep can LLMs reason? Benchmarking GPT and Llama models on nested mathematical formulas
How deep can LLMs reason? Benchmarking GPT and Llama models on nested mathematical formulasAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Pietro Lio. While LLMs have shown proficiency in a diverse range of language-related tasks, their performance in mathematical problem-solving remains an area of exploration. Through empirical evidence, I challenge the prevailing notion that scaling LLMs and employing advanced prompting strategies inherently enhance their ability to solve complex mathematical formulas. The findings I will present indicate that while LLMs can handle certain mathematical tasks effectively, they exhibit limitations when faced with nested formulas which require to reason systematically to be solved, for example by applying simple algorithms. I will present results from both GPT and Llama models, highlighting the similarities and differences in their performance. Additionally, I will discuss potential underlying factors, consider the remaining limitations of this work, and present promising avenues for future research to enhance LLMs’ mathematical problem-solving capabilities. This talk is part of the Foundation AI series. This talk is included in these lists:Note that ex-directory lists are not shown. |
Other listsThe Emmy Noether Society: Women that Count Cambridge Social Ontology Group (CSOG) Institution of Structural EngineersOther talksThe role of nolinearity and fluctuations in the non-reciprocal Cahn-Hilliard model Lightning Session Operator Algebras Early cancer trials A framework for physically consistent storylines of UK future sea-level rise Connecting W-algebras and their representations |