BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Signal Processing and Communications Lab Seminars
SUMMARY:Data sketching for cardinality and entropy estimat
ion - Dr Ioana Cosma\, Statistical Laboratory\, Un
iversity of Cambridge
DTSTART;TZID=Europe/London:20111109T141500
DTEND;TZID=Europe/London:20111109T150000
UID:TALK33968AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/33968
DESCRIPTION:Streaming data is ubiquitous in a wide range of ar
eas from engineering and information technology\,
finance\, and commerce\, to atmospheric physics\,
and earth sciences. The online approximation of pr
operties of data streams is of great interest\, bu
t this approximation process is hindered by the sh
eer size of the data and the speed at which it is
generated.\nData stream algorithms typically allow
only one pass over the data\, and maintain sub-li
near\nrepresentations of the data from which targe
t properties can be inferred with high efficiency.
\nIn this talk we consider the online approximatio
n of two important characterizations of data strea
ms: cardinality and empirical Shannon entropy. We
assume that the number of distinct elements observ
ed in the stream is prohibitively large\, so that
the vector of cumulative quantities cannot be stor
ed on main computer memory for fast and efficient
access.\nWe focus on two techniques that use pseud
o-random variates to form low-dimensional data ske
tches (using hashing and random projections)\, and
derive estimators of the cardinality and empirica
l entropy. We discuss various properties of our es
timators such as relative asymptotic efficiency\,
recursive computability\, and error and complexity
bounds. Finally\, we present results on simulated
data and seismic measurements from a volcano.\n\n
References:\n1. Peter Clifford and Ioana A. Cosma
(2011) “A statistical analysis of probabilistic co
unting\nalgorithms” (to appear in the Scandinavian
Journal of Statistics\, preprint on arXiv:0801.35
52).\n2. Peter Clifford and Ioana A. Cosma (2009)“
A simple sketching algorithm for entropy\nestimati
on” (in preparation\, preprint on arXiv:0908.3961)
.
LOCATION:LR4\, Engineering\, Department of
CONTACT:Rachel Fogg
END:VEVENT
END:VCALENDAR