University of Cambridge > Talks.cam > Engineering Safe AI > Motivation for this group, Goodhart's Law

Motivation for this group, Goodhart's Law

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact AdriĆ  Garriga Alonso.

How can we design AI systems that reliably act according to the true intent of their users, even as the capability of the systems increases?

Come to this reading group with free pizza! This week we will get started by motivating why we are doing this. In part, this is Goodhart’s Law [1] and its implications for evaluating AI systems, and designing their objectives.

The session will go as follows. At 17:00, we will start reading the material (see bottom), mostly individually. At 17:30, the discussion leader will start going through the paper, making sure everyone understands, and encouraging discussion about its contents and implications.

A basic understanding of machine learning is helpful, but detailed knowledge of the latest techniques is not required. Each session will have a brief recap of immediate necessary knowledge. The goal of this series is to get people to know more about the existing work in AI research, and eventually contribute to the field.

Join the mailing list (https://lists.cam.ac.uk/mailman/listinfo/eng-safe-ai), the Facebook group (https://www.facebook.com/groups/1070763633063871) or the talks.cam page (https://talks.cam.ac.uk/show/index/80932). Announcements about the week’s topic and other events will be sent there. Consider also inviting your friends!

READING MATERIAL :

“Building safe artificial intelligence: specification, robustness, and assurance” (2018), by Pedro A. Ortega, Vishal Maini, and the DeepMind safety team https://medium.com/@deepmindsafetyresearch/building-safe-artificial-intelligence-52f5f75058f1

“On the folly of rewarding A, while hoping for B” (1975), by Steven Kerr http://web.mit.edu/curhan/www/docs/Articles/15341_Readings/Motivation/Kerr_Folly_of_rewarding_A_while_hoping_for_B.pdf

“Categorizing Variants of Goodhart’s Law” (2018), by David Manheim and Scott Garrabrant (arXiv https://arxiv.org/abs/1803.04585)

If you have already read the material in your own time, feel free to come by at 17:30.

[1] https://en.wikipedia.org/wiki/Goodhart%2527s_law

This talk is part of the Engineering Safe AI series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity