BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Statistics
SUMMARY:Frequency and cardinality recovery from sketched d
ata: a novel approach bridging Bayesian and freque
ntist views - Stefano Favaro (University of Turin)
DTSTART;TZID=Europe/London:20231103T140000
DTEND;TZID=Europe/London:20231103T150000
UID:TALK206014AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/206014
DESCRIPTION:We study how to recover the frequency of a symbol
in a large discrete data set\, using only a (lossy
) compressed representation\, or sketch\, of those
data obtained via random hashing. \nThis is a cla
ssical problem at the crossroad of computer scienc
e and information theory\, with various algorithms
available\, such as the count-min sketch. However
\, these algorithms often assume that the data are
fixed\, leading to overly conservative and potent
ially inaccurate estimates when dealing with rando
mly sampled data. In this talk\, we consider the s
ketched data as a random sample from an unknown di
stribution\, and then we introduce novel estimator
s that improve upon existing approaches. Our metho
d combines Bayesian nonparametric and classical (f
requentist) perspectives\, addressing their unique
limitations to provide a principled and practical
solution. Additionally\, we extend our method to
address the related but distinct problem of cardin
ality recovery\, which consists of estimating the
total number of distinct objects in the data set.
We validate our method on synthetic and real data\
, comparing its performance to state-of-the-art al
ternatives.
LOCATION:MR12\, Centre for Mathematical Sciences
CONTACT:Qingyuan Zhao
END:VEVENT
END:VCALENDAR