Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Motivation for this group, Goodhart's Law

Add to your list(s) Download to your calendar using vCal

James Bell, University of Cambridge
Wednesday 17 October 2018, 17:00-18:30
Cambridge University Engineering Department, CBL Seminar room BE4-38. See https://www.openstreetmap.org/#map=18/52.19804/0.11969.

If you have a question about this talk, please contact Adrià Garriga Alonso.

How can we design AI systems that reliably act according to the true intent of their users, even as the capability of the systems increases?

Come to this reading group with free pizza! This week we will get started by motivating why we are doing this. In part, this is Goodhart’s Law [1] and its implications for evaluating AI systems, and designing their objectives.

The session will go as follows. At 17:00, we will start reading the material (see bottom), mostly individually. At 17:30, the discussion leader will start going through the paper, making sure everyone understands, and encouraging discussion about its contents and implications.

A basic understanding of machine learning is helpful, but detailed knowledge of the latest techniques is not required. Each session will have a brief recap of immediate necessary knowledge. The goal of this series is to get people to know more about the existing work in AI research, and eventually contribute to the field.

Join the mailing list (https://lists.cam.ac.uk/mailman/listinfo/eng-safe-ai), the Facebook group (https://www.facebook.com/groups/1070763633063871) or the talks.cam page (https://talks.cam.ac.uk/show/index/80932). Announcements about the week’s topic and other events will be sent there. Consider also inviting your friends!

READING MATERIAL :

“Building safe artificial intelligence: specification, robustness, and assurance” (2018), by Pedro A. Ortega, Vishal Maini, and the DeepMind safety team https://medium.com/@deepmindsafetyresearch/building-safe-artificial-intelligence-52f5f75058f1

“On the folly of rewarding A, while hoping for B” (1975), by Steven Kerr http://web.mit.edu/curhan/www/docs/Articles/15341_Readings/Motivation/Kerr_Folly_of_rewarding_A_while_hoping_for_B.pdf

“Categorizing Variants of Goodhart’s Law” (2018), by David Manheim and Scott Garrabrant (arXiv https://arxiv.org/abs/1803.04585)

If you have already read the material in your own time, feel free to come by at 17:30.

[1] https://en.wikipedia.org/wiki/Goodhart%2527s_law

This talk is part of the Engineering Safe AI series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Motivation for this group, Goodhart's Law

This talk is included in these lists:

Other lists

Other talks