![]() |
COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. | ![]() |
![]() Engineering Safe AI
Add to your list(s)
Send you e-mail reminders
Further detail
Presentations and discussions about possible solutions to the value alignment problem. If you have a question about this list, please contact: Adrià Garriga Alonso. If you have a question about a specific talk, click on that talk to find its organiser. 0 upcoming talks and 38 talks in the archive. Engineering Safe AI: Robert Miles
Can Machines Read our Minds?Starting time 30min later than usual
How useful is quantilization for mitigating specification-gaming?
Misleading meta-objectives and hidden incentives for distributional shift
Causal Reasoning from Meta-reinforcement Learning
Inverse Game Theory
Goals vs Utility Functions
Who do we want to control human-level AI?
Bayesian Theory of Mind: Modeling Joint Belief-Desire Attribution
Ambitious Value Learning
Machine Theory of Mind
Embedded Agency
Comprehensive AI Services
Incomplete Contracting and AI Alignment
The Algorithmic Foundations of Differential Privacy (Chapters 1 and 2)
Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning
Measuring and avoiding side effects using relative reachability
Interpretable Machine Learning
Scaling inverse reinforcement learning for human-compatible AI
Motivation for this group, Goodhart's Law
Approaches to avoiding negative side effects
AI Safety Gridworlds: Is my agent 'safe'?
Logical Induction: a computable approach to logical non-omniscience
Decision Boundary Geometries and Robustness of Neural Networks
Decision Theory for AI safety
Safe Exploration in Reinforcement Learning
Amplification and dialogue as mechanisms for safe advanced AI
Last term summary + discussion of topic importance
Counterargument to CIRL, and Safely Interruptible Agents
Reinforcement learning with a corrupted reward function
Solomonoff Induction and a Definition of Intelligence
Deep Reinforcement Learning from Human Preferences
An introduction to adversarial attacks and defences
'Off-Switch Games' and Corrigibility
Cooperative Inverse Reinforcement Learning
Engineering Safe AI seminar group
Please see above for contact details for this list. |
Other listsAmnesty - China CRASSH Meeting the Challenge of Healthy Ageing in the 21st CenturyOther talksNew Insights in Immunopsychiatry (Provisional Title) Flow Cytometry Rethinking African Studies: The Wisdom of the Elders A passion for pottery: a photographer’s dream job Bears, Bulls and Boers: Market Making and Southern African Mining Finance, 1894-1899 CANCELLED: Alex Goodall: The US Marine Empire in the Caribbean and Central America, c.1870-1920 |