This version of Talks.cam will be replaced by 1 July 2026, further information is available on the UIS Help Site
 

University of Cambridge > Talks.cam > Information Engineering Distinguished Lecture Series > Reinforcement Learning with Exogenous States and Rewards

Reinforcement Learning with Exogenous States and Rewards

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Kimberly Cole.

Exogenous state variables and rewards can slow reinforcement learning by injecting uncontrolled variation into the reward signal. In this talk, I’ll describe our work on formalizing exogenous state variables and rewards. Then I’ll discuss our main result: if the reward function decomposes additively into endogenous and exogenous components, the MDP can be decomposed into an exogenous Markov Reward Process (based on the exogenous reward) and an endogenous Markov Decision Process (optimizing the endogenous reward). Any optimal policy for the endogenous MDP is also an optimal policy for the original MDP , but because the endogenous reward typically has reduced variance, the endogenous MDP is easier to solve. The second half of the talk will introduce two algorithms for causal discovery of the exogenous subspace of the state space. Once discovered, we can model the exogenous reward function and remove it from the MDP so that RL can focus on the endogenous reward only. Experiments on a variety of challenging synthetic MDPs show that these methods, applied online, discover large exogenous state spaces and produce substantial speedups in reinforcement learning. (Joint work with George Trimponias (Intercom.io))

This will be followed by a discussion from 2.30pm to 3pm about the future of research in the presence of automated AI/ML research:

How should we choose research topics to study (either for automation or topics that are not amenable to automation)? How should the research be reported? What are the work products? How should we evaluate automated research? How can we separate real research from fake imitations of research? How can we assimilate an exponentially-exploding number of research results?

This talk is part of the Information Engineering Distinguished Lecture Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2026 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity