University of Cambridge > > Computational and Systems Biology > Using evolutionary sequence variation to build predictive models of protein structure and function.

Using evolutionary sequence variation to build predictive models of protein structure and function.

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Emily Boyd.

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. The explosive growth in the number of available protein sequences raises the possibility of using the natural variation present in homologous protein sequences to infer these constraints and thus identify residues that control different protein phenotypes. Because in many cases phenotypic changes are controlled by more than one amino acid, the mutations that separate one phenotype from another may not be independent, requiring us to understand the correlation structure of the data.

The challenge is to distinguish true interactions from the noisy and under-sampled set of observed correlations in a large multiple sequence alignment. We show that maximum entropy models of the protein sequence, constrained by the statistics of the multiple sequence alignment, are capable of predicting key aspects of protein function. These include (i) the inference of residue pair interactions that are accurate enough to predict all atom 3D structural models; (ii) accurate predictions of binding partners between different proteins; (iii) accurate prediction of binding between protein receptors and their target ligands. We will discuss how a mathematical framework based on random matrix theory bounds which sequence alignments contain sufficient information to build accurate predictive models. Finally, we will pose questions about the physics of binding interactions in an example from the immune system where large sets of evolutionarily related sequences are not available.

This talk is part of the Computational and Systems Biology series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2020, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity