University of Cambridge > > Isaac Newton Institute Seminar Series > Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions

Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Mustapha Amrani.


Distance-based phylogenetic reconstruction methods rely heavily on accurate pairwise distance estimates. There are two separate sources of error in this estimation process:

(1) the relatively short sequence alignments used to obtain distance estimates induce a “stochastic error” corresponding to estimation of model parameters from finite data;

(2) model misspecification leads to a “fixed error” which does not depend on sequence length.

It is common practice to assume some substitution model over the sequence data and use an additive substitution rate function for that model when computing pairwise distances. In the providential case when the assumed model coincides with the true model, which is typically unkown, the distance estimates will not be afflicted with fixed error. But even then, there is no reason to a-priori enforce a zero fixed error, when this causes elevated rates of stochastic error, especially in the case of short sequence alignments.

This work challenges this paradigm of “using the most additive distance function at any cost”. We do this by studying the contribution and effect of both fixed and stochastic error in distance estimation. We present a formal framework for quantifying the fixed error associated with a specific distance function and a given phylogenetic tree in a homogeneous substitution model. As an example, we study the behavior of the Jukes-Cantor distance formula in homogeneous instances of Kimura’s two parameter substitution model. The effects of fixed error are observed through analytic results and experiments on simulated data. In addition, we compare the performance of various distance functions on biological sequences. We evaluate reconstruction accuracy by comparing the reconstructed trees to an independently validated species tree. Our study indicates that often enough simple distance functions outperform more sophisticated functions, despite the fact that the given sequence data appears to have poor fit to the substitution model they assume.

This talk is part of the Isaac Newton Institute Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2023, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity