University of Cambridge > Talks.cam > Machine Learning Reading Group @ CUED > Random Features for Kernel Approximation

Random Features for Kernel Approximation

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact James Allingham.

Zoom link available upon request (it is sent out on our mailing list, eng-mlg-rcc [at] lists.cam.ac.uk). Sign up to our mailing list for easier reminders.

Though ubiquitous and mathematically elegant, kernel methods notoriously suffer from poor scalability as dataset size grows on account of the need to store and invert the Gram matrix. This has motivated a number of kernel approximation techniques. Chief among them are random features, which construct low-rank decompositions to the Gram matrix via Monte Carlo methods. We begin by discussing Rahimi and Recht’s seminal paper on Random Fourier Features, which approximates stationary kernels with a randomised sum of sinusoids. We briefly draw parallels to the celebrated Johnson-Lindenstrauss transform, before discussing how Orthogonal Random Features enjoy better convergence. We demonstrate the effectiveness of these techniques for approximating attention in Transformers. Finally – if you will humour me – we will briefly discuss how carefully induced correlations between random features can further improve the quality of kernel approximation, describing the recently-introduced class of Simplex Random Features.

Papers:

Rahimi, A. and Recht, B. (2007). Random features for large-scale kernel machines. Advances in neural information processing systems, 20.

Johnson, W. B. (1984). Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math., 26:189–206.

Yu, F. X. X., Suresh, A. T., Choromanski, K. M., Holtmann-Rice, D. N., and Kumar, S. (2016). Orthogonal random features. Advances in neural information processing systems, 29.

Choromanski, K., Likhosherstov, V., Dohan, D., Song, X., Gane, A., Sarlos, T., Hawkins, P., Davis, J., Mohiuddin, A., Kaiser, L., et al. (2020). Rethinking attention with performers. International Conference on Learning Representations, 9.

Reid, I., Choromanski, K., Likhosherstov, V., and Weller, A. (2023). Simplex random features. arXiv preprint arXiv:2301.13856

This talk is part of the Machine Learning Reading Group @ CUED series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity