COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
Non-asymptotic control of a kernel 2-sample testAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Dr Sergio Bacallado. We are interested in statistical tests to evaluate the hypothesis H₀: {P = Q} against its alternative H₁: {P ≠ Q}. Our data are multivariate, high-dimensional and exhibit strong dependencies between variables. We propose a comparison test of two distributions based on kernel methods: our data are first transformed via a well-chosen feature map and live in a reproducing kernel Hilbert space (RKHS). Our kernel test statistic is the equivalent of the Hotelling’s T2 comparison test for finite-dimensional multivariate data, and is equal to the mean embeddings difference (MMD) renormalized by a well-chosen covariance operator. Classically, these non-parametric tests are either calibrated asymptotically, or via test aggregation techniques. Here, we propose to calibrate the test at a given fixed sample size by obtaining non-asymptotic bounds on our test statistic. For this, a regularization is required to approximate the covariance operator via its empirical estimator. Unlike the approaches of Harchaoui et al. (2007) or Hagrass et al. (2023) using L_2 regularizations, we propose spectral truncation. This method fixes the unknown number T of eigenfunctions to reconstruct the covariance operator and provides the additional advantage of data visualization. Currently, at a fixed T, the test statistic, called the truncated kernel Fisher Discriminant Ratio (KFDA_T), provides a test whose asymptotic calibration is known (Ozier-Lafontaine et al. (2023)). In this talk, I will present how to theoretically and non-asymptotically bound the p-value of the test associated with the KFDA _T. This bound is a first step in defining a good calibration of the hyperparameter T. In applications, this statistical question is essential in the field of genomics, where the two groups are composed of single-cell RNA -seq data. The goal is to detect distinct or similar biological behaviour between the groups. Joint work with Bertrand Michel (Université de Nantes, France), Franck Picard (ENS de Lyon, France) and Vincent Rivoirard (Paris-Dauphine, France). This talk is part of the Statistics series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsDAMTP BioLunch Cambridge Neuroscience Seminar: New Approaches in Neuroscience Cambridge University Heraldic and Genealogical SocietyOther talksDimension theory of groups of circle diffeomorphisms Structuring experience in cognitive spaces Roll waves and their analogues: dynamics and coarsening Making Real-World Multi-Robot Systems More Practical Resiliency of Fluctuating Layered Ordered States - A Reduced Model Using brain organoids to understand cell fate |