Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions

Add to your list(s) Download to your calendar using vCal

Doerr, D (Bielefeld )
Friday 24 June 2011, 10:20-10:40
Seminar Room 1, Newton Institute.

If you have a question about this talk, please contact Mustapha Amrani.

Phylogenetics

Distance-based phylogenetic reconstruction methods rely heavily on accurate pairwise distance estimates. There are two separate sources of error in this estimation process:

(1) the relatively short sequence alignments used to obtain distance estimates induce a “stochastic error” corresponding to estimation of model parameters from finite data;

(2) model misspecification leads to a “fixed error” which does not depend on sequence length.

It is common practice to assume some substitution model over the sequence data and use an additive substitution rate function for that model when computing pairwise distances. In the providential case when the assumed model coincides with the true model, which is typically unkown, the distance estimates will not be afflicted with fixed error. But even then, there is no reason to a-priori enforce a zero fixed error, when this causes elevated rates of stochastic error, especially in the case of short sequence alignments.

This work challenges this paradigm of “using the most additive distance function at any cost”. We do this by studying the contribution and effect of both fixed and stochastic error in distance estimation. We present a formal framework for quantifying the fixed error associated with a specific distance function and a given phylogenetic tree in a homogeneous substitution model. As an example, we study the behavior of the Jukes-Cantor distance formula in homogeneous instances of Kimura’s two parameter substitution model. The effects of fixed error are observed through analytic results and experiments on simulated data. In addition, we compare the performance of various distance functions on biological sequences. We evaluate reconstruction accuracy by comparing the reconstructed trees to an independently validated species tree. Our study indicates that often enough simple distance functions outperform more sophisticated functions, despite the fact that the given sequence data appears to have poor fit to the substitution model they assume.

This talk is part of the Isaac Newton Institute Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions

This talk is included in these lists:

Other lists

Other talks