University of Cambridge > Talks.cam > Logic and Semantics Seminar (Computer Laboratory) > A logical approach to data provenance.

A logical approach to data provenance.

Add to your list(s) Download to your calendar using vCal

  • UserJames Cheney, Informatics, University of Edinburgh
  • ClockFriday 17 November 2006, 14:00-15:00
  • HouseFW11.

If you have a question about this talk, please contact Tom Ridge.

While scientific computation historically has been synonymous with large scale numerical computation and supercomputing, scientists are now using increasingly sophisticated computational techniques such as databases and decentralized “Grid” computation. However, scientific data is expected to meet rigorous standards of data integrity, and this is difficult to achieve using current tools. One important ingredient of scientific data integrity is that data should be accompanied by documentation of the process by which it was recorded: for example, creation/modification timestamps, descriptions of any operations performed, and identities of authors and intermediate sources. This information is often called \emph{provenance} or \emph{lineage}.

Although many provenance-tracking systems and provenance data models have been proposed, most of them are based on ad hoc definitions of provenance, some of which depend on the syntax of the program (rather than its semantics). Thus, the behavior of such systems varies widely, and we lack a uniform framework to compare the correctness and expressiveness of various approaches.

We will describe a new approach which is based on the idea that provenance should reflect the ways the output of a function depends on its input; in particular, it should reflect counterfactual information (i.e., tell us something about what would have happened if the input were changed). We formalize this approach by defining a logic whose models are functions and whose formulas are assertions about the dependence behavior of the function. This provides a general approach to defining and reasoning about the correctness of provenance-tracking techniques.

This talk describes joint work with Peter Buneman (U. Edinburgh), Stijn Vansummeren (U. Hasselt, Belgium), and Adriane Chapman (U. Michigan)

This talk is part of the Logic and Semantics Seminar (Computer Laboratory) series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity