University of Cambridge > > Engineering Safe AI > Misleading meta-objectives and hidden incentives for distributional shift

Misleading meta-objectives and hidden incentives for distributional shift

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Adrià Garriga Alonso.

This week: “Misleading meta-objectives and hidden incentives for distributional shift.” David Krueger, Tegan Maharaj, Shane Legg and Jan Leike. [Paper] [BibTeX]

The authors aim to show that Meta-Learning can create hidden incentives for agents to change their task rather than solving the task we tell them to. An example would be an agent that predicts when someone wants coffee: after learning that the person has coffee in the morning they learn to wake them up when they try to sleep in, so following a seemingly suboptimal policy (wake up the human) results in a better prediction. Their paper runs experiments to show that Meta-Learning agents with Population-Based Training (PBT) learn to exhibit non-myopic behaviour even when their reward is myopic. They also demonstrate for these agents a method for eliminating this non-myopic behaviour that they call Environment Swapping.

As always, there will be free pizza. The first half hour is for stragglers to finish reading.

Invite your friends to join the mailing list (, the Facebook group ( or the page ( Details about the next meeting, the week’s topic and other events will be advertised in these places.

This talk is part of the Engineering Safe AI series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2023, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity