Reinforcement learning with a corrupted reward function
Add to your list(s)
Download to your calendar using vCal
If you have a question about this talk, please contact AdriĆ Garriga Alonso.
No real-world reward function is perfect. Sensory errors and software bugs may result in RL agents observing higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. Two ways around the problem are investigated.
This talk is part of the Engineering Safe AI series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
|