Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Can Humans Supervise Increasingly Ultracrepidarian AI?

Add to your list(s) Download to your calendar using vCal

Jose Hernandez-Orallo
Thursday 05 December 2024, 15:00-16:00
Centre for Mathematical Sciences, MR14.

If you have a question about this talk, please contact Matthew Colbrook.

Large language models have evolved to solve increasingly complex problems but still fail at many simple ones—from a human point of view. This discordance with human difficulty expectations strongly affects the reliability of these models, as users cannot identify a safe operating condition where the model is expected to be correct. With the extensive use of scaling up and shaping up (such as RLHF ) in newer generations of LLMs, we question whether this is the case. In a recent Nature paper, we examined several LLM families and showed that instances that are easy for humans are usually easy for the models. However, scaled-up, shaped-up models do not secure areas of low difficulty in which either the model does not err or human supervision can spot the errors. We also found that early models often avoid user questions, whereas scaled-up, shaped-up models tend to give apparently sensible yet wrong answers much more often, including errors on difficult questions that human supervisors frequently overlook. Finally, we disentangled whether this behaviour arises from scaling up or shaping up, and discovered new scaling laws showing that larger models become more incorrect and especially more ultracrepidarian, operating beyond their competence. These findings highlight the need for a fundamental shift in the design and development of general-purpose artificial intelligence, particularly in high-stakes areas where a predictable distribution of errors is paramount.

The talk will be based on the recent paper: L Zhou, W Schellaert, FM Plumed, YM Daval, C Ferri, JH Orallo (2024) “Larger and more instructable language models become less reliable”, Nature, 61-68

This talk is part of the Applied and Computational Analysis series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Can Humans Supervise Increasingly Ultracrepidarian AI?

This talk is included in these lists:

Other lists

Other talks