BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Statistics Reading Group
SUMMARY:Learning based on Data Compression - Steven de Roo
ij\, University of Cambridge
DTSTART;TZID=Europe/London:20090211T163000
DTEND;TZID=Europe/London:20090211T173000
UID:TALK16774AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/16774
DESCRIPTION:Bayesian statistics are used all over the place\,
but in the shadows lurks\nanother inference method
called the Minimum Description Length principle\n
(MDL). Although MDL and Bayesian inference turn ou
t to be quite closely\nrelated technically\, they
are the product of very different sequences of\nid
eas. In this presentation I will describe the back
ground of MDL and how it\nturns out to be related
to Bayesian statistics. To this end I will introdu
ce\nKolmogorov complexity\, which is a measure of
the amount of information in\nobserved data that d
oes not require probabilistic assumptions. Althoug
h\nKolmogorov complexity\, which is itself based o
n Turing's notion of effective\ncomputation\, can
theoretically be used to learn just about anything
that you\nmight want to learn\, unfortunately it
is itself uncomputable. MDL was\ndeveloped as a co
mputable version of Kolmogorov complexity\, but la
ter\nborrowed heavily from information theory and
statistics as well. The Kraft\ninequality provides
the crucial connection between methods based on d
ata\ncompression and methods based on probability
theory.\n\nReading:\n\nThis presentation focuses a
t no one single paper\, but below I've listed the\
nrelevant references\, many of which are classics.
However\, for those\ninterested in compression-ba
sed learning I'd recommend looking at the\nfollowi
ng two textbooks instead:\n\nP.Vitanyi and M. Li 2
008. "An Introduction to Kolmogorov Complexity and
its\nApplications". Third edition\, Springer Verl
ag. See\nhttp://homepages.cwi.nl/~paulv/kolmogorov
.html.\nP. Grunwald 2007. "The Minimum Description
Length principle". MIT Press. See\nhttp://mitpres
s.mit.edu/catalog/item/default.asp?ttype=2&tid=111
55\n\nReferences:\n\nA.M. Turing 1936. "On Computa
ble Numbers\, with an Application to the\nEntschei
dungsproblem". Proceedings of the London Mathemati
cal Society series\n2\, 42:230-265. Many versions
appear online\, for example\nhttp://www.thocp.net/
biographies/papers/turing_oncomputablenumbers_1936
.pdf\n\nC.E. Shannon 1948. "A Mathematical Theory
of Communication"\, Bell Systems\nTechnical Journa
l 27:379-423\,623-656. Available at\nhttp://cm.bel
l-labs.com/cm/ms/what/shannonday/shannon1948.pdf\n
\nL.G. Kraft 1949. "A Device for Quantising\, Grou
ping and Coding\nAmplitude-Modulated Pulses". Mast
er's thesis\, MIT. Not available online.\n\nB. McM
illan 1956. "Two Inequalities implied by Unique De
cipherability". IEEE\nTransactions on Information
Theory\, 2(4):115-116.\nAvailable at http://ieeexp
lore.ieee.org/stamp/stamp.jsp?arnumber=01056818\n\
nR. Solomonoff 1964. "A Formal Theory of Inductive
Inference" (Parts I\,II).\nInformation and Contro
l\, 7(1):1-22 (Part I) and 7(2):224-254 (part II).
\nAvailable at\nhttp://world.std.com/~rjs/1964pt1.
pdf (part I)\nhttp://world.std.com/~rjs/1964pt2.pd
f (part II)\n\nA.N. Kolmogorov 1965. "Three Approa
ches to the Quantitative Definition of\nInformatio
n". Problems of Information Transmission 1(1):1-7\
nNot available online.\n\nJ. Rissanen 1978. "Model
ing by the Shortest Data Description". Automatica\
n14:465-471\n\n
LOCATION:MR5\, CMS
CONTACT:Richard Samworth
END:VEVENT
END:VCALENDAR