University of Cambridge > > Theory - Chemistry Research Interest Group > Synthetic maps for navigating high-dimensional data spaces

Synthetic maps for navigating high-dimensional data spaces

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Lisa Masters.

The analysis of large databases aims at obtaining a synthetic description of a system revealing its salient features. We will describe an approach for charting complex and heterogeneous data spaces, providing a topography of the high-dimensional probability distribution from which the data are harvested. This topography includes information on the number and the height of the probability peaks, the depth of the “valleys” separating them, the relative location of the peaks and their hierarchical organization. The topography is reconstructed by using an unsupervised variant of Density Peak clustering[Science, 1492, vol 322 (2014)] exploiting a non-parametric density estimator[JCTC ,1206, vol 14 , (2018) ], which automatically measures the density in the manifold containing the data[Sci Rep. 12140, vol 7 (2017)]. Importantly, the density estimator provides an estimate of the error. This is a key feature, which allows distinguishing genuine probability peaks from density fluctuations due to finite sampling.

This talk is part of the Theory - Chemistry Research Interest Group series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2023, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity