BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Machine Learning Reading Group @ CUED
SUMMARY:Game theory\, distributional reinforcement learnin
g\, control and verification - Prof. Alessandro Ab
ate\, Dr. Licio Romao\, Dr. Yulong Gao and Dr. Jia
rui Gan. University of Oxford
DTSTART;TZID=Europe/London:20230607T110000
DTEND;TZID=Europe/London:20230607T123000
UID:TALK202237AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/202237
DESCRIPTION:This week\, the MLG looks forward to welcoming fou
r guest speakers from Oxford.\n\n*Talk 1:*\n\n_Tit
le:_ Formal Synthesis with Neural Templates\nSpeak
er: Prof. Alessandro Abate (Dept. Computer Science
\, Univ. of Oxford\, UK)\n\n_Abstract:_ I shall pr
esent recent work on CEGIS\, a "counterexample-gui
ded inductive synthesis'' framework for sound synt
hesis tasks that are relevant for dynamical models
\, control problems\, and software programs. The i
nductive synthesis framework comprises the interac
tion of two components\, a learner and a verifier.
The learner trains a neural template on finite sa
mples. The verifier soundly validates the candidat
es trained by the learner\, by means of calls to a
SAT-modulo-theory solver. Whenever the candidate
is not valid\, SMT-generated counter-examples are
passed to the learner for further training. \n\n_B
io:_ Alessandro Abate is Professor of Verification
and Control in the Department of Computer Science
at the University of Oxford\, where he is also De
puty Head of Department. Earlier\, he did research
at Stanford University and at SRI International\,
and was an Assistant Professor at the Delft Cente
r for Systems and Control\, TU Delft. He received
an MS/PhD from the University of Padova and UC Ber
keley. His research interests lie on the formal v
erification and control of stochastic hybrid syste
ms\, and in their applications in cyber-physical s
ystems\, particularly involving safety criticality
and energy. He blends in techniques from machine
learning and AI\, such as Bayesian inference\, rei
nforcement learning\, and game theory.\n \n\n*Talk
2:*\n\n_Title:_ Policy synthesis with guarantees
\n\n_Speaker:_ Dr. Licio Romao (Dept. Computer Sci
ence\, Univ. of Oxford\, UK)\n\n_Abstract:_ In thi
s talk\, I will present two techniques to perform
feedback policy synthesis with guarantees. First\,
I will introduce a new concept of RL robustness a
nd show how to obtain the best robust policy withi
n a class of sub-optimal solutions by leveraging l
exicographic optimisation. The proposed notion of
robustness is motivated by the fact that\, at depl
oyment\, the state of the system may not be precis
ely known due to measurement errors. In the second
part of the talk\, I will present a new technique
to derive abstractions of stochastic dynamical sy
stems. Our methodology is agnostic to the probabil
ity measure that generates the noise and leads to
an interval Markov Decision Process (iMDP) represe
ntation of the original dynamics\; the interval tr
ansition probability contains\, with high probabil
ity\, the true transition probability between stat
es of the abstraction. The PAC guarantees of the p
roposed framework are obtained due to a non-trivia
l connection with the scenario approach theory\, a
technique that has had tremendous success within
the control community.\n\n_Bio:_ Licio Romao is a
postdoctoral research assistant in the Department
of Computer Science at the University of Oxford.
He obtained his PhD in August 2021 from the Depart
ment of Engineering Science\, and MSc and BSc from
the University of Campinas (UNICAMP) and the Fede
ral University of Campina Grande (UFCG)\, respecti
vely. His PhD thesis was awarded the Institute of
Engineering Technology’s (IET) Control and Automat
ion Dissertation Prize 2021. His research combines
techniques from formal verification\, control the
ory\, applied mathematics\, and machine learning t
o enable the design of safer and more reliable fee
dback systems.\n_Relevant papers:_\n· D. Jar
ne\, L. Romao\, L. Hammond\, M. Mazo Jr\, A. Abate
. Observational Robustness and Invariances in Rein
forcement Learning via Lexicographic Objectives. 2
023. Link: https://licioromao.com/assets/papers/JR
HMA23.pdf.\n· T. Badings\, L. Romao\, A. Aba
te\, D. Parker\, H. Poonwala\, M. Stoelinga\, N. J
ensen. Robust Control for Dynamical Systems with N
on-Gaussian via Formal Abstractions. Journal of Ar
tificial Inteligence Research. 2023. Link: https:/
/licioromao.com/assets/papers/BRAPPSJ23.pdf.\n·
T. Badings\, L. Romao\, A. Abate\, N. Jensen.
Probabilities are not enough: formal controller sy
nthesis for stochastic dynamical systems with epis
temic uncertainty. AAAI Conference On Artificial I
ntelligence \, 2023. Link: https://licioromao.com/
assets/papers/BRAJ23a.pdf.\n\n\n*Talk 3:*\n\n_Titl
e:_ Policy Evaluation in Distributional LQR\n\n_Sp
eaker:_ Dr. Yulong Gao (Dept. Computer Science\, U
niv. of Oxford\, UK)\n\n_Abstract:_ Distributional
reinforcement learning (DRL) enhances the underst
anding of the effects of the randomness in the env
ironment by letting agents learn the distribution
of a random return\, rather than its expected valu
e as in standard RL. At the same time\, a main cha
llenge in DRL is that policy evaluation in DRL typ
ically relies on the representation of the return
distribution\, which needs to be carefully designe
d. In this talk\, I will discuss a special class o
f DRL problems that rely on discounted linear quad
ratic regulator (LQR) for control\, advocating for
a new distributional approach to LQR\, which we c
all distributional LQR. Specifically\, we provide
a closed-form expression of the distribution of th
e random return which\, remarkably\, is applicable
to all exoge- nous disturbances on the dynamics\,
as long as they are independent and identically d
istributed (i.i.d.). While the proposed exact retu
rn distribution consists of infinitely many random
variables\, we show that this distribution can be
approximated by a finite number of random variabl
es\, and the associated approximation error can be
analytically bounded under mild assumptions. Usin
g the approximate return distribution\, we propose
a zeroth-order policy gradient algorithm for risk
-averse LQR using the Conditional Value at Risk (C
VaR) as a measure of risk. Numerical experiments a
re provided to illustrate our theoretical results.
(https://arxiv.org/abs/2303.13657)\n\n_Bio:_ Yulo
ng Gao is a postdoctoral researcher at the Departm
ent of Computer Science\, University of Oxford. H
e received the joint Ph.D. degree in Electrical En
gineering in 2021 from KTH Royal Institute of Tech
nology\, Sweden\, and Nanyang Technological Univer
sity\, Singapore. Before moving to Oxford\, he was
a Researcher at KTH from 2021 to 2022. He was the
receipt of the VR International Postdoc Grant fro
m Swedish Research Council. His research interests
include automatic verification\, stochastic contr
ol and model predictive control with application t
o safety-critical systems.\n\n*Talk 4:*\n\n_Title:
_ Sequential information and mechanism design\n\n_
Speaker:_ Dr. Jiarui Gan (Dept. Computer Science\,
Univ. of Oxford\, UK)\n\n_Abstract:_ Many problem
s in game theory involve reasoning between multipl
e parties with asymmetric access to information. T
his broad class of problems lead to many research
questions about information and mechanism design\,
with broad-ranging applications from governance a
nd public administration to e-commerce and financi
al services. In particular\, there has been a rece
nt surge of interest in exploring the more general
ized sequential versions of these problems\, where
players interact over multiple time steps in a ch
anging environment. In this talk\, I will present
a framework of sequential principal-agent problems
that is capable of modeling a wide range of infor
mation and mechanism design problems. I will talk
about our recent algorithmic results on the comput
ation and learning of optimal decision-making in t
his framework.\n \n_Bio:_ Jiarui Gan is a Departme
ntal Lecturer at the Computer Science Department\,
University of Oxford\, working in the Artificial
Intelligence & Machine Learning research theme. Be
fore this he was a postdoctoral researcher at Max
Planck Institute for Software Systems\, and he obt
ained his PhD from Oxford. Jiarui is broadly inter
ested in algorithmic problems in game theory. His
current focus is on sequential information and mec
hanism design problems. His recent work has been s
elected for an Outstanding Paper Honorable Mention
at the AAAI'22 conference.
LOCATION:Cambridge University Engineering Department\, CBL
Seminar room BE4-38.
CONTACT:Isaac Reid
END:VEVENT
END:VCALENDAR