COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Lennard-Jones Centre > Improving Data Sub-selection for Supervised Tasks with Principal Covariates Regression
Improving Data Sub-selection for Supervised Tasks with Principal Covariates RegressionAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Dr M. Simoncelli. Data analyses based on linear methods constitute the simplest, most robust, and transparent approaches to the automatic processing of large amounts of data for building supervised or unsupervised machine learning models. Principal covariates regression (PCovR) is an underappreciated method that interpolates between principal component analysis and linear regression and can be used to conveniently reveal structure-property relations in terms of simple-to-interpret, low-dimensional maps. We have recently introduced methods that incorporate PCovR into two popular data selection approaches, CUR and Farthest Point Sampling, which iteratively identify the most diverse samples and discriminating features. While our approach is completely general, here we focus on systems relevant to atomistic simulations, chemistry, and materials science—fields where feature and sample selection are an increasingly common practice. Our results show that these selection methods identify data subsets that out-perform their unsupervised counterparts—which we demonstrate with models of increasing complexity, from ridge regression to kernel ridge regression and finally feed-forward neural networks. This work pulls from: Structure-Property Maps with Kernel Principal Covariates Regression; BA Helfrecht, RK Cersonsky, G Fraux, M Ceriotti Machine Learning: Science and Technology 1 Improving Sample and Feature Selection with Principal Covariates Regression; RK Cersonsky, BA Helfrecht, EA Engel, S Kliavinek, M Ceriotti Machine Learning: Science and Technology 2 This talk is part of the Lennard-Jones Centre series. This talk is included in these lists:Note that ex-directory lists are not shown. |
Other listsmySociety Meetups Economic Epidemiology Disaster Resilient Supply Chain Operations (DROPS) Workshop SeriesOther talksThe growth of supermassive black holes in the absence of mergers and the effect on their host galaxies A story of spinning tops, swirling plumes and sugar Toward a Wong-Zakai approximation for big order generators Limit theorems, financial applications and entropy of fractional Brownian motion: solved and unsolved problems Mechanics of blastocyst morphogenesis Non-Gaussianity and random diffusivity models |