University of Cambridge > Talks.cam > Seminars on Quantitative Biology @ CRUK Cambridge Institute  > ShrinkSeq: a flexible and powerful method for Bayesian analysis of RNAseq data

ShrinkSeq: a flexible and powerful method for Bayesian analysis of RNAseq data

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Florian Markowetz.

Note the time change: 2pm (not 4pm)

Next generation sequencing is quickly replacing microarrays as a technique to probe different molecular levels of the cell, such as DNA or mRNA. The technology has the advantage to provide higher resolution, while reducing biases, in particular at the lower end of the spectrum. mRNA sequencing (RNAseq) data consist in counts of pieces of RNA called tags. This type of data imposes new challenges for statistical analysis. We present a novel approach to model and analyze these data.

Methodologies and softwares for differential expression analysis usually use some generalization of the Poisson or Binomial distribution that accounts for overdispersion. A popular choice is the negative binomial (i.e. Poisson-Gamma) model. However, there is no consensus on what model fits best to RNAseq data, and this may depend on the technology used. With RNAseq, the number of features vastly exceeds the sample size. This implies that shrinkage of variance-related parameters may lead to more stable estimates and inference. Methods to do so are available, but only for a single parameter and in the context of restrictive study designs, e.g. two-group comparisons or fixed-effect designs.

We present a framework that allows for a) various count models b) flexible designs c) random effects and d) multi-parameter shrinkage across tags by Empirical Bayes. Moreover, it implements Bayesian multiplicity correction, thereby providing solid inference. In a data-based simulation, we show that our method outperforms other methods (edgeR, DESeq, baySeq, noiSeq). Moreover, we illustrate our approach on two data sets. The first is a CAGE data set containing 25 samples representing five regions of the human brain from seven individuals. The design is incomplete and a batch effect is present. The second is a miRNA sequencing data set from seven pairs of tumors. The data motivates use of the zero-inflated negative binomial as a powerful alternative to the negative binomial, because it leads to less bias of the overdispersion parameter and improved detection power for the low-count tags.

The framework is not restricted to RNAseq data. It is currently being extended towards proteomics, high-throughput screening and integrative data, in particular DNA copy number and mRNA/miRNA.

This talk is part of the Seminars on Quantitative Biology @ CRUK Cambridge Institute series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity