BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Economics &\; Policy Seminars\, CJBS
SUMMARY:The Law of Large Populations: The return of the lo
ng-ignored N and how it can affect our 2020 vision
- Xiao-Li Meng\, Whipple V. N. Jones Professor of
Statistics\, Harvard University
DTSTART;TZID=Europe/London:20180312T170000
DTEND;TZID=Europe/London:20180312T180000
UID:TALK101758AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/101758
DESCRIPTION:For over a century now\, we statisticians have suc
cessfully convinced ourselves and almost everyone
else\, that in statistical inference the size of t
he population N can be ignored\, especially when i
t is large. Instead\, we focused on the size of t
he sample\, n\, the key driving force for both the
Law of Large Numbers and the Central Limit Theore
m. We were thus taught that the statistical error
(standard error) goes down with n typically at the
rate of 1/√n. However\, all these rely on the p
resumption that our data have perfect quality\, in
the sense of being equivalent to a probabilistic
sample. A largely overlooked statistical identity
\, a potential counterpart to the Euler identity i
n mathematics\, reveals a Law of Large Populations
(LLP)\, a law that we should be all afraid of. Th
at is\, once we lose control over data quality\, t
he systematic error (bias) in the usual estimators
\, relative to the benchmarking standard error fro
m simple random sampling\, goes up with N at the r
ate of √N. The coefficient in front of √N can be
viewed as a data defect index\, which is the simp
le Pearson correlation between the reporting/recor
ding indicator and the value reported/recorded. B
ecause of the multiplier √N\, a seemingly tiny cor
relation\, say\, 0.005\, can have detrimental effe
ct on the quality of inference. Without understan
ding of this LLP\, “big data” can do more harm th
an good because of the drastically inflated precis
ion assessment hence a gross overconfidence\, sett
ing us up to be caught by surprise when the realit
y unfolds\, as we all experienced during the 2016
US presidential election. Data from Cooperative Co
ngressional Election Study (CCES\, conducted by St
ephen Ansolabehere\, Douglas River and others\, an
d analyzed by Shiro Kuriwaki)\, are used to esti
mate the data defect index for the 2016 US electio
n\, with the aim to gain a clearer vision for the
2020 US election and beyond.
LOCATION:LT4\, Simon Sainsburys Building\, Cambridge Judge
Business School
CONTACT:Emily Brown
END:VEVENT
END:VCALENDAR