University of Cambridge > Talks.cam > Artificial Intelligence Research Group Talks (Computer Laboratory) > AIReg-Bench: Benchmarking Language Models That Assess AI Regulation Compliance

Log in

University Account

External (via Google)

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

AIReg-Bench: Benchmarking Language Models That Assess AI Regulation Compliance

Download to your calendar using vCal

William Marino (University of Cambridge)
Tuesday 21 October 2025, 13:00-14:00
Lecture Theatre 1, Computer Laboratory, William Gates Building.

If you have a question about this talk, please contact Mateja Jamnik .

NOTE unusual place (LT1)

As governments move to regulate AI, there is growing interest in using Large Language Models (LLMs) to assess whether or not an AI system complies with a given AI Regulation (AIR). However, there is presently no way to benchmark the performance of LLMs at this task. To fill this void, we introduce AIReg-Bench: the first benchmark dataset designed to test how well LLMs can assess compliance with the EU AI Act (AIA). We created this dataset through a two-step process: (1) by prompting an LLM with carefully structured instructions, we generated 120 technical documentation excerpts (samples), each depicting a fictional, albeit plausible, AI system — of the kind an AI provider might produce to demonstrate their compliance with AIR ; (2) legal experts then reviewed and annotated each sample to indicate whether, and in what way, the AI system described therein violates specific Articles of the AIA . The resulting dataset, together with our evaluation of whether frontier LLMs can reproduce the experts’ compliance labels, provides a starting point to understand the opportunities and limitations of LLM -based AIR compliance assessment tools and establishes a benchmark against which subsequent LLMs can be compared. The dataset and evaluation code are available at https://github.com/camlsys/aireg-bench.

You can also join us on Zoom

This talk is part of the Artificial Intelligence Research Group Talks (Computer Laboratory) series.