University of Cambridge > Talks.cam > Lennard-Jones Centre > Improving Data Sub-selection for Supervised Tasks with Principal Covariates Regression

Improving Data Sub-selection for Supervised Tasks with Principal Covariates Regression

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Dr M. Simoncelli.

Data analyses based on linear methods constitute the simplest, most robust, and transparent approaches to the automatic processing of large amounts of data for building supervised or unsupervised machine learning models. Principal covariates regression (PCovR) is an underappreciated method that interpolates between principal component analysis and linear regression and can be used to conveniently reveal structure-property relations in terms of simple-to-interpret, low-dimensional maps. We have recently introduced methods that incorporate PCovR into two popular data selection approaches, CUR and Farthest Point Sampling, which iteratively identify the most diverse samples and discriminating features. While our approach is completely general, here we focus on systems relevant to atomistic simulations, chemistry, and materials science—fields where feature and sample selection are an increasingly common practice. Our results show that these selection methods identify data subsets that out-perform their unsupervised counterparts—which we demonstrate with models of increasing complexity, from ridge regression to kernel ridge regression and finally feed-forward neural networks.

This work pulls from:

Structure-Property Maps with Kernel Principal Covariates Regression; BA Helfrecht, RK Cersonsky, G Fraux, M Ceriotti Machine Learning: Science and Technology 1

Improving Sample and Feature Selection with Principal Covariates Regression; RK Cersonsky, BA Helfrecht, EA Engel, S Kliavinek, M Ceriotti Machine Learning: Science and Technology 2

This talk is part of the Lennard-Jones Centre series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity