University of Cambridge > Talks.cam > Machine learning in Physics, Chemistry and Materials discussion group (MLDG) > Ranking the information content of distance measures through the information imbalance

Ranking the information content of distance measures through the information imbalance

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Bingqing Cheng .

Real-world data typically contain a large number of features that are often heterogeneous in nature, relevance, and also units of measure. When assessing the similarity between data points, one can build various distance measures using subsets of these features. Using the fewest features but still retaining sufficient information about the system is crucial in many statistical learning schemes, particularly when data are sparse. In my talk I will describe the “information imbalance”: a novel statistical concept that quantifies the relative information retained when using two different distance measures, and determine if they are equivalent, independent, or if one is more informative than the other. I will then show how the information imbalance can be used to find the most informative distance measure out of a pool of candidates, and present applications of this idea for the analysis of the Covid-19 epidemic spreading as well as for the construction of optimally informative descriptors of physical systems.

This talk is part of the Machine learning in Physics, Chemistry and Materials discussion group (MLDG) series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2021 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity