BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Signal Processing and Communications Lab Seminars
SUMMARY:Fastest Convergence for Reinforcement Learning -
Prof. Sean Meyn\, University of Florida
DTSTART;TZID=Europe/London:20181008T150000
DTEND;TZID=Europe/London:20181008T160000
UID:TALK112225AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/112225
DESCRIPTION: \nThere are two well known Stochastic Approximati
on techniques that are known to have optimal rate
of convergence (measured in terms of asymptotic va
riance): the Stochastic Newton-Raphson (SNR) algor
ithm [a matrix gain algorithm that resembles the d
eterministic Newton-Raphson method]\, and the Rup
pert-Polyak averaging technique. This talk will
present new applications of these concepts for rei
nforcement learning.\n \n1. Introducing Zap Q-Lea
rning. In recent work\, first presented at NIPS
201\, it is shown that a new formulation of SNR p
rovides a new approach to Q-learning that has prov
ably optimal rate of convergence under general ass
umptions\, and astonishingly quick convergence i
n numerical examples. In particular\, the standa
rd Q-learning algorithm of Watkins typically has i
nfinite asymptotic covariance\, and in simulations
the Zap Q-Learning algorithm exhibits much faster
convergence than the Ruppert-Polyak averaging met
hod. The only difficulty is the matrix inversion
required in the SNR recursion. \n \n2. A remedy i
s proposed based on a variant of Polyak’s heavy-ba
ll method. For a special choice of the “momentum"
gain sequence\, it is shown that the parameter es
timates obtained from the algorithm are essentiall
y identical to those obtained using SNR. This ne
w algorithm does not require matrix inversion. I
n simulations it is found that the sample paths of
the two algorithms couple. A theoretical explan
ation for coupling is established for linear recur
sions.\n\n*Biosketch*\n\nSean Meyn received the BA
degree in mathematics from the University of Cali
fornia\, Los Angeles\, in 1982 and the PhD degree
in electrical engineering from McGill University\,
Canada\, in 1987 (with Prof. P. Caines). He is no
w Professor and Robert C. Pittman Eminent Scholar
Chair in the Department of Electrical and Computer
Engineering at the University of Florida\, the di
rector of the Laboratory for Cognition and Control
\, and director of the Florida Institute for Susta
inable Energy. His academic research interests inc
lude theory and applications of decision and contr
ol\, stochastic processes\, and optimization. He h
as received many awards for his research on these
topics\, and is a fellow of the IEEE. He has held
visiting positions at universities all over the wo
rld\, including the Indian Institute of Science\,
Bangalore during 1997-1998 where he was a Fulbrigh
t Research Scholar. During his latest sabbatical d
uring the 2006-2007 academic year he was a visitin
g professor at MIT and United Technologies Researc
h Center (UTRC). His award-winning 1993 monograph
with Richard Tweedie\, Markov Chains and Stochasti
c Stability\, has been cited thousands of times in
journals from a range of fields. The latest versi
on is published in the Cambridge Mathematical Libr
ary. For the past ten years his applied research h
as focused on engineering\, markets\, and policy i
n energy systems. He regularly engages in industry
\, government\, and academic panels on these topic
s\, and hosts an annual workshop at the University
of Florida.\n
LOCATION:LT6\, Baker Building\, CUED
CONTACT:Prof. Ramji Venkataramanan
END:VEVENT
END:VCALENDAR