University of Cambridge > > Information Theory Seminar > Provably Safe Certification for Machine Learning Models under Adversarial Attacks

Provably Safe Certification for Machine Learning Models under Adversarial Attacks

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Prof. Ramji Venkataramanan.

It is widely known that state-of-the-art machine learning models — including vision and language ones — can be seriously compromised by adversarial perturbations, so it is also increasingly relevant to develop capability to certify their performance in the presence of the most effective adversarial attacks.

This talk will introduce an approach inspired by distribution-free risk controlling procedures to certify the performance of machine learning models in the presence of adversarial attacks, with population level risk guarantees. In particular, given a specific attack, we will introduce the notion of a machine learning model (alpha, zeta)—safety guarantee: this guarantee, which is supported by a testing procedure based on the availability of a calibration set, entails one will only declare that a machine learning model adversarial (population) risk is less than alpha (i.e. the model is safe) given that the model adversarial (population) risk is higher than alpha (i.e. the model is in fact unsafe), with probability less than zeta. We will also introduce Bayesian optimization oriented approaches to determine very efficiently whether or not a machine learning model is (alpha, zeta)-safe in the presence of an adversarial attack, along with their statistical guarantees.

This talk will also illustrate how to apply our framework to a range of machine learning models — including various sizes of vision Transformer (ViT) and ResNet models — impaired by a variety of adversarial attacks.

This talk is part of the Information Theory Seminar series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity