University of Cambridge > Talks.cam > Engineering Safe AI > Reinforcement learning with a corrupted reward function

Reinforcement learning with a corrupted reward function

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact AdriĆ  Garriga Alonso.

No real-world reward function is perfect. Sensory errors and software bugs may result in RL agents observing higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. Two ways around the problem are investigated.

This talk is part of the Engineering Safe AI series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity