BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Stochastic Errors vs. Modeling Errors in Distance Based Phylogenet
 ic Reconstructions - Doerr\, D (Bielefeld )
DTSTART:20110624T092000Z
DTEND:20110624T094000Z
UID:TALK31882@talks.cam.ac.uk
CONTACT:Mustapha Amrani
DESCRIPTION:Distance-based phylogenetic reconstruction methods rely heavil
 y on accurate pairwise distance estimates. There are two separate sources 
 of error in this estimation process: \n\n(1) the relatively short sequence
  alignments used to obtain distance estimates induce a "stochastic error" 
 corresponding to estimation of model parameters from finite data\; \n\n(2)
  model misspecification leads to a "fixed error" which does not depend on 
 sequence length. \n\nIt is common practice to assume some substitution mod
 el over the sequence data and use an additive substitution rate function f
 or that model when computing pairwise distances. In the providential case 
 when the assumed model coincides with the true model\, which is typically 
 unkown\, the distance estimates will not be afflicted with fixed error. Bu
 t even then\, there is no reason to a-priori enforce a zero fixed error\, 
 when this causes elevated rates of stochastic error\, especially in the ca
 se of short sequence alignments.\n\nThis work challenges this paradigm of 
 "using the most additive distance function at any cost". We do this by stu
 dying the contribution and effect of both fixed and stochastic error in di
 stance estimation. We present a formal framework for quantifying the fixed
  error associated with a specific distance function and a given phylogenet
 ic tree in a homogeneous substitution model.  As an example\, we study the
  behavior of the Jukes-Cantor distance formula in homogeneous instances of
  Kimura's two parameter substitution model. The effects of fixed error are
  observed through analytic results and experiments on simulated data. In a
 ddition\, we compare the performance of various distance functions on biol
 ogical sequences. We evaluate reconstruction accuracy by comparing the rec
 onstructed trees to an independently validated species tree. Our study ind
 icates that often enough simple distance functions outperform more sophist
 icated functions\, despite the fact that the given sequence data appears t
 o have poor fit to the substitution model they assume.\n\n
LOCATION:Seminar Room 1\, Newton Institute
END:VEVENT
END:VCALENDAR
