COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Isaac Newton Institute Seminar Series > Waste Not, Want Not: Why Rarefying Microbiome Data is not an optimal normalization procedure
Waste Not, Want Not: Why Rarefying Microbiome Data is not an optimal normalization procedureAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Mustapha Amrani. Mathematical, Statistical and Computational Aspects of the New Science of Metagenomics Co-author: Paul Joey McMurdie (Stanford University) The interpretation of metagenomic count data originating from the current generation of DNA sequencing platforms requires special attention. In particular, the per-sample library sizes often vary by orders of magnitude from the same sequencing run, and the counts are overdispersed relative to a simple Poisson model. These challenges can be addressed using an appropriate mixture model that simultaneously accounts for library size differences and biological variability. This approach is already well-characterized and implemented for RNA -Seq data in R packages such as edgeR and DESeq. We use statistical theory, extensive simulations, and empirical data to show that variance stabilizing normalization using a mixture model like the negative binomial is appropriate for microbiome count data. In simulations detecting differential abundance, normalization procedures based on a Gamma-Poisson mixture model provided systematic improvement in performance over crude proportions or rarefied counts—both of which led to a high rate of false positives. In simulations evaluating clustering accuracy, we found that the rarefying procedure discarded samples that were nevertheless accurately clustered by alternative methods, and that the choice of minimum library size threshold was critical in some settings, but with an optimum that is unknown in practice. Techniques that use variance stabilizing transformations by modeling microbiome count data with a mixture distribution, such as those implemented in edgeR and DESeq, substantially improved upon techniques that attempt t o normalize by rarefying or crude proportions. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq. Related Links: http://arxiv.org/abs/1310.0424 – Arxiv Version of Paper. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0061217 – Phyloseq Package Description and Philosophy This talk is part of the Isaac Newton Institute Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsMajor Public Lectures in Cambridge Chemical Engineering and Biotechnology occasional seminars Camtessential Beyond Profit Think Tank Cambridge Forum of Science and HumanitiesOther talksFrom Euler to Poincare Filling box flows in porous media Viral evolution on sub-phylogenetic timescales Perylene-Based Poly(N-Heterocycles): Organic Semiconductors, Biological Fluorescence Probes and Building Blocks for Molecular Surface Networks Uncertainty Quantification with Multi-Level and Multi-Index methods Art and Migration Lecture Supper: James Stuart: Radical liberalism, ‘non-gremial students’ and continuing education Formation and disease relevance of axonal endoplasmic reticulum, a "neuron within a neuron”. Are hospital admissions for people with palliative care needs avoidable and unwanted? Cambridge-Lausanne Workshop 2018 - Day 2 Measuring Designing: Design Cognitiometrics, Physiometrics & Neurometrics |