BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Computational and Systems Biology Seminar Series 2
 023 - 24
SUMMARY:An introduction to counts-of-counts data  - Simon 
 Tavaré PhD Herbert and Florence Irving Director Ir
 ving Institute for Cancer Dynamics &amp\; Professo
 r\, Departments of Statistics and Biological Scien
 ces Columbia University
DTSTART;TZID=Europe/London:20221012T140000
DTEND;TZID=Europe/London:20221012T150000
UID:TALK178910AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/178910
DESCRIPTION:Counts-of-counts data arise in many areas of biolo
 gy and medicine\, and have been studied by statist
 icians since the 1940s. One of the first examples\
 , discussed by R. A. Fisher and collaborators in 1
 943 [1]\, concerns estimation of the number of uno
 bserved species based on summary counts of the num
 ber of species observed once\, twice\, … in a samp
 le of specimens. The data are summarized by the nu
 mbers C1\, C2\, … of species represented once\, tw
 ice\, … in a sample of size \nN = C1 + 2 C2 + 3 C3
  + ….  containing S = C1 + C2 + … species\; the ve
 ctor C = (C1\, C2\, …) gives the counts-of-counts.
  Other examples include the frequencies of the dis
 tinct alleles in a human genetics sample\, the cou
 nts of distinct variants of the SARS-CoV-2 S prote
 in obtained from consensus sequencing experiments\
 , counts of sizes of components in certain combina
 torial structures [2]\, and counts of the numbers 
 of SNVs arising in one cell\, two cells\, … in a c
 ancer sequencing experiment. \n\nIn this talk I wi
 ll outline some of the stochastic models used to m
 odel the distribution of C\, and some of the infer
 ential issues that come from estimating the parame
 ters of these models. I will touch on the celebrat
 ed Ewens Sampling Formula [3] and Fisher’s multipl
 e sampling problem concerning the variance expecte
 d between values of S in samples taken from the sa
 me population [3]. Variants of birth-death-immigra
 tion processes can be used\, for example when diff
 erent variants grow at different rates. The classi
 cal Yule process with immigration can be used to d
 erive some of the combinatorial results in a simpl
 e way\, through a probabilistic trick known as emb
 edding.\n\nReferences\n\n[1] Fisher RA\, Corbet AS
  & Williams CB. J Animal Ecology\, 12\, 1943\n[2] 
 Arratia R\, Barbour AD & Tavaré S. Logarithmic Com
 binatorial Structures\, EMS\, 2002\n[3] Ewens WJ. 
 Theoret Popul Biol\, 3\, 1972\n[4] Da Silva P\, Ja
 mshidpey A\, McCullagh P & Tavaré S. Bernoulli\, i
 n press\, 2022  
LOCATION:CMS\, Meeting Room 15
CONTACT:Samantha Noel
END:VEVENT
END:VCALENDAR
