COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > Isaac Newton Institute Seminar Series > A Bayesian Partitioning Approach to Duplicate Detection and Record Linkage
A Bayesian Partitioning Approach to Duplicate Detection and Record LinkageAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact INI IT. DLAW02 - Data linkage: techniques, challenges and applications Record linkage techniques allow us to combine different sources of information from a common population in the absence of unique identifiers. Linking multiple files is an important task in a wide variety of applications, since it permits us to gather information that would not be otherwise available, or that would be too expensive to collect. In practice, an additional complication appears when the datafiles to be linked contain duplicates. Traditional approaches to duplicate detection and record linkage output independent decisions on the coreference status of each pair of records, which often leads to non-transitive decisions that have to be reconciled in some ad-hoc fashion. The joint task of linking multiple datafiles and finding duplicate records within them can be alternatively posed as partitioning the datafiles into groups of coreferent records. We present an approach that targets this partition as the parameter of interest, thereby ensuring transitive decisions. Our Bayesian implementation allows us to incorporate prior information on the reliability of the fields in the datafiles, which is especially useful when no training data are available, and it also provides a proper account of the uncertainty in the duplicate detection and record linkage decisions. We show how this uncertainty can be incorporated in certain models for population size estimation. Throughout the document we present a case study to detect killings that were reported multiple times to organizations recording human rights violations during the civil war of El Salvador. This talk is part of the Isaac Newton Institute Seminar Series series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsCambridge UCU Dominic Sandbrook: 'State of Emergency: Britain in the 1970s' CMS Seminars from business and industry Dio-Gandhi equations Neurobiology EconomicsOther talksBeating your final boss battle, or presenting with confidence and style (tough mode) Enhancing the Brain and Wellbeing in Health and Disease TALK CANCELLED Targets for drug discovery: from target validation to the clinic Understanding and Estimating Physical Parameters in Electric Motors using Mathematical Modelling Interrogating T cell signalling and effector function in hypoxic environments Organoid systems to study the maternal-fetal dialogue of early pregnancy |