Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

'Off-Switch Games' and Corrigibility

Add to your list(s) Download to your calendar using vCal

Richard Ngo (University of Cambridge)
Wednesday 01 November 2017, 17:00-18:30
Cambridge University Engineering Department, CBL Seminar room BE4-38. For directions see http://learning.eng.cam.ac.uk/Public/Directions.

If you have a question about this talk, please contact Adrià Garriga Alonso.

By default, an AI system will have an incentive to prevent humans from switching it off, or otherwise interfering in its operation, as this would prevent it from maximising its reward. An AI system is ‘corrigible’ if it has an incentive to accept human corrections. Inverse Reinforcement Learning (IRL) can help mitigate this problem in some cases, but there is disagreement as to whether IRL can guarantee corrigibility in all cases.

Papers: https://arxiv.org/abs/1611.08219 https://intelligence.org/files/Corrigibility.pdf https://intelligence.org/2017/08/31/incorrigibility-in-cirl/

This talk is part of the Engineering Safe AI series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

'Off-Switch Games' and Corrigibility

This talk is included in these lists:

Other lists

Other talks