Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Variable selection and classification with large-scale presence only data

Add to your list(s) Download to your calendar using vCal

Garvesh Raskutti (University of Wisconsin-Madison)
Friday 19 January 2018, 11:45-12:30
Seminar Room 1, Newton Institute.

If you have a question about this talk, please contact INI IT.

STSW01 - Theoretical and algorithmic underpinnings of Big Data

Co-author: Hyebin Song (University of Wisconsin-Madison)

In various real-world problems, we are presented with positive and unlabelled data, referred to as presence-only responses where the number of covariates $p$ is large. The combination of presence-only responses and high dimensionality presents both statistical and computational challenges. In this paper, we develop the \emph{PUlasso} algorithm for variable selection and classification with positive and unlabelled responses. Our algorithm involves using the majorization-minimization (MM) framework which is a generalization of the well-known expectation-maximization (EM) algorithm. In particular to make our algorithm scalable, we provide two computational speed-ups to the standard EM algorithm. We provide a theoretical guarantee where we first show that our algorithm is guaranteed to converge to a stationary point, and then prove that any stationary point achieves the minimax optimal mean-squared error of $\frac{s \log p}{n}$, where $s$ is the sparsity of the true parameter. We also demonstrate through simulations that our algorithm out-performs state-of-the-art algorithms in the moderate $p$ settings in terms of classification performance. Finally, we demonstrate that our PUlasso algorithm performs well on a biochemistry example.

Related Links

https://arxiv.org/abs/1711.08129 – Link to Arxiv paper

This talk is part of the Isaac Newton Institute Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Variable selection and classification with large-scale presence only data

This talk is included in these lists:

Other lists

Other talks