Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Can we automatically anonymize text documents?

Add to your list(s) Download to your calendar using vCal

Pierre Lison, Norwegian Computing Center
Thursday 19 May 2022, 11:00-12:00
https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBdXVpOXFvdz09.

If you have a question about this talk, please contact Marinela Parovic.

Text documents often contain personal data in some form. To protect the privacy of the individuals referred to in those documents, it is often desirable (and, in many cases, mandatory) to edit those documents such as to conceal the identity of those individuals. This anonymization process remains a difficult task, at the intersection of NLP , law and data privacy. In this talk, I’ll give an overview of current approaches and outline a number of unsolved problems. Furthermore, I’ll present the Text Anonymization Benchmark (TAB), a new corpus and evaluation framework dedicated to this task. TAB contains 1268 court cases from the European Court of Human Rights manually enriched with detailed annotations regarding the personal data expressed in each document. We hope this new benchmark will inspire NLP researchers to work on this challenging but important problem.

This talk is part of the Language Technology Lab Seminars series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Can we automatically anonymize text documents?

This talk is included in these lists:

Other lists

Other talks