Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

How useful is quantilization for mitigating specification-gaming?

Add to your list(s) Download to your calendar using vCal

Speaker to be confirmed
Wednesday 22 May 2019, 17:00-18:30
Engineering Department, CBL Seminar room BE4-38.

If you have a question about this talk, please contact Adrià Garriga Alonso.

This week: “How useful is quantilization for mitigating specification-gaming?” by Ryan Carey. Paper available here, published in the ICLR 2019 Safe Machine Learning workshop

If we have a specification that does not perfectly reflect what we care about, there are ways to maximize it which we want to avoid. To mitigate reward hacking (or specification-gaming), we can perform “quantilization, a method that interpolates between imitating demonstrations, and optimizing the proxy objective. If the demonstrations are of adequate quality, and the proxy reward overestimates perfor- mance, then quantilization has better guaranteed performance than other strategies. However, if the proxy reward underestimates performance, then either imitation or optimization will offer the best guarantee.”

As always, there will be free pizza. The first half hour is for stragglers to finish reading.

Invite your friends to join the mailing list (https://lists.cam.ac.uk/mailman/listinfo/eng-safe-ai), the Facebook group (https://www.facebook.com/groups/1070763633063871) or the talks.cam page (https://talks.cam.ac.uk/show/index/80932). Details about the next meeting, the week’s topic and other events will be advertised in these places.

This talk is part of the Engineering Safe AI series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

How useful is quantilization for mitigating specification-gaming?

This talk is included in these lists:

Other lists

Other talks