University of Cambridge > > Engineering Safe AI > How useful is quantilization for mitigating specification-gaming?

How useful is quantilization for mitigating specification-gaming?

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Adrià Garriga Alonso.

This week: “How useful is quantilization for mitigating specification-gaming?” by Ryan Carey. Paper available here, published in the ICLR 2019 Safe Machine Learning workshop

If we have a specification that does not perfectly reflect what we care about, there are ways to maximize it which we want to avoid. To mitigate reward hacking (or specification-gaming), we can perform “quantilization, a method that interpolates between imitating demonstrations, and optimizing the proxy objective. If the demonstrations are of adequate quality, and the proxy reward overestimates perfor- mance, then quantilization has better guaranteed performance than other strategies. However, if the proxy reward underestimates performance, then either imitation or optimization will offer the best guarantee.”

As always, there will be free pizza. The first half hour is for stragglers to finish reading.

Invite your friends to join the mailing list (, the Facebook group ( or the page ( Details about the next meeting, the week’s topic and other events will be advertised in these places.

This talk is part of the Engineering Safe AI series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2023, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity