University of Cambridge > > Language Technology Lab Seminars > Can we automatically anonymize text documents?

Can we automatically anonymize text documents?

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Marinela Parovic.

Text documents often contain personal data in some form. To protect the privacy of the individuals referred to in those documents, it is often desirable (and, in many cases, mandatory) to edit those documents such as to conceal the identity of those individuals. This anonymization process remains a difficult task, at the intersection of NLP , law and data privacy. In this talk, I’ll give an overview of current approaches and outline a number of unsolved problems. Furthermore, I’ll present the Text Anonymization Benchmark (TAB), a new corpus and evaluation framework dedicated to this task. TAB contains 1268 court cases from the European Court of Human Rights manually enriched with detailed annotations regarding the personal data expressed in each document. We hope this new benchmark will inspire NLP researchers to work on this challenging but important problem.

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2024, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity