COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Machine Learning Reading Group @ CUED > Random Features for Kernel Approximation
Random Features for Kernel ApproximationAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact James Allingham. Zoom link available upon request (it is sent out on our mailing list, eng-mlg-rcc [at] lists.cam.ac.uk). Sign up to our mailing list for easier reminders. Though ubiquitous and mathematically elegant, kernel methods notoriously suffer from poor scalability as dataset size grows on account of the need to store and invert the Gram matrix. This has motivated a number of kernel approximation techniques. Chief among them are random features, which construct low-rank decompositions to the Gram matrix via Monte Carlo methods. We begin by discussing Rahimi and Recht’s seminal paper on Random Fourier Features, which approximates stationary kernels with a randomised sum of sinusoids. We briefly draw parallels to the celebrated Johnson-Lindenstrauss transform, before discussing how Orthogonal Random Features enjoy better convergence. We demonstrate the effectiveness of these techniques for approximating attention in Transformers. Finally – if you will humour me – we will briefly discuss how carefully induced correlations between random features can further improve the quality of kernel approximation, describing the recently-introduced class of Simplex Random Features. Papers: Rahimi, A. and Recht, B. (2007). Random features for large-scale kernel machines. Advances in neural information processing systems, 20. Johnson, W. B. (1984). Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math., 26:189–206. Yu, F. X. X., Suresh, A. T., Choromanski, K. M., Holtmann-Rice, D. N., and Kumar, S. (2016). Orthogonal random features. Advances in neural information processing systems, 29. Choromanski, K., Likhosherstov, V., Dohan, D., Song, X., Gane, A., Sarlos, T., Hawkins, P., Davis, J., Mohiuddin, A., Kaiser, L., et al. (2020). Rethinking attention with performers. International Conference on Learning Representations, 9. Reid, I., Choromanski, K., Likhosherstov, V., and Weller, A. (2023). Simplex random features. arXiv preprint arXiv:2301.13856 This talk is part of the Machine Learning Reading Group @ CUED series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsTranslational Science in the UK Doble riotType the title of a new list hereDouble Cambridge University Bahá'í SocietyOther talksThe leanest automata AAG Lent 2023 Seminar 4 - Coffee Production and Consumption: Ethnoarchaeological Perspectives from Southwest Ethiopia Introduction to Exotic Gardening PCE - Adaptation Inverse problems for non-linear partial differential equations and applications in tomography |