University of Cambridge > Talks.cam > Isaac Newton Institute Seminar Series > Data compression with statistical guarantees

Data compression with statistical guarantees

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact info@newton.ac.uk.

SINW01 - Scalable statistical inference

Joint talk with Daniel Ahfock (MRC Biostatistics Unit @ University of Cambridge)

The talk is concerned with translating recent ideas from computer science on probabilistic data-compression techniques into a statistical framework that can be ‘safely’ applied for speeding linear regression analyses for very larges sample sizes in bio-medicine.

 Our motivation is to facilitate the use of multivariate regression and model exploration in tall data sets, so that, for example, genetic association analyses carried out on hundreds of thousands of subjects can investigate multivariate effects for a set of explanatory features, rather than be restricted to one feature at a time associations for computational feasibility.

Among the many approaches to dealing with tall data, probabilistic data compression techniques using random linear mapping, developed in the computer science community, so called sketching, are particularly suitable for linear regression problems. In the first part of the talk, we will present a hierarchical representation of sketching, which allows deriving statistical properties (distributional) of different sketching algorithms. In particular, we will discuss how the signal to noise ratio in the original data set is important for the choice of sketching algorithm. In the second part of the talk, we will further refine some of the approximation guarantees and consider iterative sketches. The talk will be illustrated on a genetic analysis of the link between a blood cell trait and the HLA region involving a sample of 130,000 people.

http://arxiv.org/abs/1706.03665


This talk is part of the Isaac Newton Institute Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2017 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity