University of Cambridge > Talks.cam > Applied and Computational Analysis > Can Humans Supervise Increasingly Ultracrepidarian AI?

Can Humans Supervise Increasingly Ultracrepidarian AI?

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Matthew Colbrook.

Large language models have evolved to solve increasingly complex problems but still fail at many simple ones—from a human point of view. This discordance with human difficulty expectations strongly affects the reliability of these models, as users cannot identify a safe operating condition where the model is expected to be correct. With the extensive use of scaling up and shaping up (such as RLHF ) in newer generations of LLMs, we question whether this is the case. In a recent Nature paper, we examined several LLM families and showed that instances that are easy for humans are usually easy for the models. However, scaled-up, shaped-up models do not secure areas of low difficulty in which either the model does not err or human supervision can spot the errors. We also found that early models often avoid user questions, whereas scaled-up, shaped-up models tend to give apparently sensible yet wrong answers much more often, including errors on difficult questions that human supervisors frequently overlook. Finally, we disentangled whether this behaviour arises from scaling up or shaping up, and discovered new scaling laws showing that larger models become more incorrect and especially more ultracrepidarian, operating beyond their competence. These findings highlight the need for a fundamental shift in the design and development of general-purpose artificial intelligence, particularly in high-stakes areas where a predictable distribution of errors is paramount.

The talk will be based on the recent paper: L Zhou, W Schellaert, FM Plumed, YM Daval, C Ferri, JH Orallo (2024) “Larger and more instructable language models become less reliable”, Nature, 61-68

This talk is part of the Applied and Computational Analysis series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity