Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Information theoretic model selection in clustering

Add to your list(s) Download to your calendar using vCal

Joachim M Buhmann, Department of Computer Science, ETH Zurich
Wednesday 28 October 2009, 11:00-12:00
Engineering Department, CBL Room 438.

If you have a question about this talk, please contact Peter Orbanz.

Partitioning of data sets into groups defines an important preprocessing step for compression, prototype extraction or outlier removal. Various criteria of connectedness or proximity have been proposed to group data according to structural similarity but in general it is unclear which method or model to use. In the spirit of information theory we propose a decision process to determine the amount of extractable information from data conditioned on a hypothesis class of partitions. A sender-receiver-scenario defines an approximation capacity for a clustering problem which quantizes the hypothesis class and, thereby, introduces sets of statistically indistinguishible partitionings. The quality of a clustering model is determined by its ability to extract more “signal” bits from a data source than a competing data interpretation.

Empirical evidence for this model selection concept is provided by cluster validation in computer security, i.e., multilabel clustering of Boolean data for role based access control, but also in analysis of microarray data.

This talk is part of the Machine Learning @ CUED series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Information theoretic model selection in clustering

This talk is included in these lists:

Other lists

Other talks