University of Cambridge > > Language Technology Lab Seminars > Clinical De-Identification and Semantic Relatedness

Clinical De-Identification and Semantic Relatedness

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Marinela Parovic.

The first part of this talk will discuss the development of novel low-cost approaches to de-identifying clinical notes. The second part of the talk discuss the development of a new dataset of semantic relatedness for sentence pairs.. This dataset, STR -2021, has 5,500 English sentence pairs manually annotated for semantic relatedness using a comparative annotation framework. We show that the resulting scores have high reliability (repeat annotation correlation of 0.84). We use the dataset to explore a number of questions on what makes two sentences more semantically related. We also evaluate a suite of sentence representation methods on their ability to place pairs that are more related closer to each other in vector space.

This talk is part of the Language Technology Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2023, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity