Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Counterargument to CIRL, and Safely Interruptible Agents

Add to your list(s) Download to your calendar using vCal

Adrià Garriga Alonso (University of Cambridge)
Wednesday 06 December 2017, 17:00-18:30
Cambridge University Engineering Department, CBL Seminar room BE4-38. For directions see http://learning.eng.cam.ac.uk/Public/Directions.

If you have a question about this talk, please contact Adrià Garriga Alonso.

Cooperative Inverse Reinforcement Learning (CIRL) is a game with a robot R and human H, in which R tries to maximise H’s reward while not knowing it. R is incentivised to shut down on H’s suggestion, since that provides information about the H’s reward function. However, Carey (2017) shows that, if R and H do not share the same prior for the reward, R may remain incorrigible. Carey then makes a case for forced interruptibility. We will talk about Carey’s examples and the strength of the case for forced interruptibility.

Orseau and Armstrong (2016) provide a formal notion of satisfactory learning under forced interruptions. Then they show how Q-learning satisfies it, and SARSA and AIXI -with-exploration can be modified to satisfy it. We will go over the proof outlines and discuss their implications for corrigibility.

Reading list:

Ryan Carey. 2017. “Incorrigibility in the CIRL Framework.” arXiv:1709.06275 [cs.AI].

Laurent Orseau and Stuart Armstrong. 2016. “Safely Interruptible Agents.” Paper presented at the 32nd Conference on Uncertainty in Artificial Intelligence.

Slides: https://valuealignment.ml/talks/2017-12-06-interruptibility.pdf

This talk is part of the Engineering Safe AI series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Counterargument to CIRL, and Safely Interruptible Agents

This talk is included in these lists:

Other lists

Other talks