COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Computer Laboratory Security Seminar > How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated QuestionsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Hridoy Sankar Dutta. Large language models (LLMs) can “lie”, which we define as outputting false statements despite “knowing” the truth in a demonstrable sense. LLMs might “lie”, for example, when instructed to output misinformation. Here, we develop a simple lie detector that requires neither access to the LLM ’s activations (black-box) nor ground-truth knowledge of the fact in question. The detector works by asking a predefined set of unrelated follow-up questions after a suspected lie, and feeding the LLM ’s yes/no answers into a logistic regression classifier. Despite its simplicity, this lie detector is highly accurate and surprisingly general. When trained on examples from a single setting—prompting GPT -3.5 to lie about factual questions—the detector generalises out-of-distribution to (1) other LLM architectures, (2) LLMs fine-tuned to lie, (3) sycophantic lies, and (4) lies emerging in real-life scenarios such as sales. These results indicate that LLMs have distinctive lie-related behavioural patterns, consistent across architectures and contexts, which could enable general-purpose lie detection. https://cam-ac-uk.zoom.us/j/88053652228?pwd=NG1LTDdUc2VkV3pGdlpSdHZ5N3h0Zz09 Meeting ID: 880 5365 2228 Passcode: 081966 RECORDING : Please note, this event will be recorded and will be available after the event for an indeterminate period under a CC BY -NC-ND license. Audience members should bear this in mind before joining the webinar or asking questions. NOTE : Please do not post URLs for the talk, and especially Zoom links to Twitter because automated systems will pick them up and disrupt our meeting. This talk is part of the Computer Laboratory Security Seminar series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsQuantum Tricritical Points in NbFe2 Monitoring Upper Extremity Activity to Track Rehabilitation Progress following Spinal Cord Injury The Microsoft AI ResidencyOther talksDiscovery and scale-up of a novel herbicide by Paul Burton from Syngenta Research Software Engineering in the Arts and Humanities : a community-driven approach CANCELLED: Rafal Szabla on Quantum Chemical Perspective on the Prebiotic Origins of RNA and DNA Challenges to the accurate monitoring cognitive health: recent findings from Sheffield The Water Insecurity Experiences Scales (wwwWISEscales.org): The Value of Globally Comparable Data on Water Access, Use, and Reliability. Friends and Countrymen: The London private banker and eighteenth-century society |