University of Cambridge > Talks.cam > Statistics > Learning Markov Networks for Mixed Big Data: Applications to Cancer Genomics

Learning Markov Networks for Mixed Big Data: Applications to Cancer Genomics

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact .

“Mixed Data’’ comprising a large number of heterogeneous variables (e.g. count, binary, continuous, skewed continuous, among other data types) is prevalent in varied areas such as imaging genetics, national security, social networking, Internet advertising, and our particular motivation – high-throughput integrative genomics. There have been limited efforts at statistically modeling such mixed data jointly, in part because of the lack of computationally amenable multivariate distributions that can capture direct dependencies between variables of different types. In this talk, we address this by introducing several new classes of Markov Random Fields (MRFs), or graphical models, that yield joint densities over mixed variables. To begin, we present a novel class of MRFs arising when all node-conditional distributions follow univariate exponential family distributions that, for instance, yield novel Poisson graphical models. Next, we introduce extensions of this for Mixed MRF distributions. Unfortunately, these formulations can place severe and unrealistic restrictions on the parameter space. To remedy this, we we introduce a class of mixed conditional random field distributions, that are then chained according to a block-directed acyclic graph to form a new class of so-called Block Directed Markov Random Fields (BDMRFs). The Markov independence graph structure underlying our BDMRF then has both directed and undirected edges.

We will briefly review the theoretical properties of these models and introduce penalized conditional likelihood estimators with statistical guarantees for learning the underlying mixed network structure. Simulations as well as an application to integrative cancer genomics demonstrate the versatility of our methods. In our particular example, we learn integrative genomic networks from breast cancer next generation sequencing expression data and mutation data that yield several interesting findings.

Joint work with Eunho Yang, Pradeep Raviukmar, Zhandong Liu, Yulia Baker, and Ying-Wooi Wan.

This talk is part of the Statistics series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2019 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity