University of Cambridge > Talks.cam > Isaac Newton Institute Seminar Series > Linking taxa to function through contig clustering of microbial metagenomes

Linking taxa to function through contig clustering of microbial metagenomes

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Mustapha Amrani.

Mathematical, Statistical and Computational Aspects of the New Science of Metagenomics

Co-authors: Johannes Alneberg (KTH Royal Institute of Technology, Stockholm, Sweden), Brynjar Smaari Bjarnason (KTH Royal Institute of Technology, Stockholm, Sweden), Ino de Bruijn (KTH Royal Institute of Technology, Stockholm, Sweden), Melanie Schirmer (University of Glasgow), Joshua Quick (University of Birmingham), Nicholas J. Loman (University of Birmingham), Anders F. Andersson (KTH Royal Institute of Technology, Stockholm, Sweden), Konstantinos Gerasimidis (University of Glasgow)

Taxonomic profiling of microbial communities can answer the question of Who is there? This can be achieved either through marker gene sequencing or true shotgun metagenomics. The latter because the functional genes of all community members are sequenced allows us to answer the additional question: What are they doing? However, there is a third question that is key to understanding microbial communities: Who is doing what? This question has received much less attention because to answer it requires the extraction of complete genomes from metagenomes. Assembly of metagenomes can generate millions of contigs, assembled genome fragments, with no information on which contig derives from which genome. Here I will present CONCOCT , a novel algorithm that combines sequence composition, coverage across multiple samples, and read-pair linkage to automatically cluster contigs into genomes. CONCOCT uses a dimensionality reduction coupled to a Gaus sian mixture model, fit using a variational Bayesian algorithm which automatically identifies the optimal number of clusters. We demonstrate high recall and precision rates on artificial as well as real human gut metagenome datasets. Linking contigs into genome clusters, allows the frequencies of those clusters to be related to metadata, revealing function. We apply this approach to fecal metagenomes obtained from the E. coli O104 :H4 epidemic (Germany, 2011) and are able to directly extract the outbreak genome. We also use it to identify organisms associated with inflammation in samples from children with Crohns disease.

Related Links

http://arxiv.org/abs/1312.4038 – arXiv preprint

This talk is part of the Isaac Newton Institute Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity