BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:How deep can LLMs reason? Benchmarking GPT and Llama models on nes
 ted mathematical formulas - Flavio Petruzzellis\, Universita' di Padova
DTSTART:20240607T160000Z
DTEND:20240607T170000Z
UID:TALK217849@talks.cam.ac.uk
CONTACT:Pietro Lio
DESCRIPTION:While LLMs have shown proficiency in a diverse range of langua
 ge-related tasks\, their performance in mathematical problem-solving remai
 ns an area of exploration. Through empirical evidence\, I challenge the pr
 evailing notion that scaling LLMs and employing advanced prompting strateg
 ies inherently enhance their ability to solve complex mathematical formula
 s. The findings I will present indicate that while LLMs can handle certain
  mathematical tasks effectively\, they exhibit limitations when faced with
  nested formulas which require to reason systematically to be solved\, for
  example by applying simple algorithms. I will present results from both G
 PT and Llama models\, highlighting the similarities and differences in the
 ir performance. Additionally\, I will discuss potential underlying factors
 \, consider the remaining limitations of this work\, and present promising
  avenues for future research to enhance LLMs' mathematical problem-solving
  capabilities.
LOCATION:Lecture Theatre 2\, Computer Laboratory\, William Gates Building
END:VEVENT
END:VCALENDAR