# Representing conditional independence in probabilistic programming

Programs can be used to give compact representations of distributions: in order to represent a distribution, one simply gives a program that would generate an exact sample were the random number generator to produce true randomness. This approach to representing distributions by probabilistic programs works not only for simple distributions on numbers like Poissons, Gaussians, etc., but also for more exotic distributions on, e.g., phrases in natural language, friendships in a social network, rendered 2D images of 3D scenes, and climate sensor measurements.

Probabilistic programming systems support statistical inference on models defined by probabilistic programs. By constraining some variables of a program (e.g., simulated sensor readings in some climate model) and studying the conditional distribution of other variables (e.g., the parameters of the climate model), we can identify plausible variable settings that agree with the constraints. Conditional inferences like this would allow us to, e.g., build predictive text systems for mobile phones, identify missing links in a social network, guess the 3D shape of an object from only a photograph, or study the underlying mechanisms driving observed climate measurements.

Probabilistic programming systems for machine learning and statistics are still in their infancy, and there are many interesting theoretical and applied problems yet to be tackled. One of the most challenging problems is understanding when such systems can be efficient. The efficiency of general-purpose probabilistic inference algorithms, especially probabilistic programming systems, often depends on the abundance of conditional independence in the underlying probabilistic model. Informally, conditional independence among random variables in a model can allow a large inference problem to be broken down into smaller pieces that can be solved individually. But even if a probabilistic model exhibits a high degree of conditional independence, the chosen probabilistic programming language might make it difficult or impossible to expose that conditional independence to the algorithm, all but ruling out the hope for an efficient algorithm.

A challenging and important domain for probabilistic programming systems is nonparametric Bayesian statistics, and in the course of studying and implementing state-of-the-art models, we have identified several situations where the probabilistic programming language Church seems to be unable to represent the conditional independencies present in a model. But the problem is not specific to Church, because we show that it is a general problem for any system that represents distributions with sampling programs that must halt with probability one. We propose a new definition for “probabilistic program” where the program must sample correctly only with arbitrarily high probability, and show that this relaxation allows us to represent conditional independencies that were previously unrepresentable.

Joint work with Nate Ackerman, Jeremy Avigad, Cameron Freer and Jason Rute.

This talk is part of the Logic and Semantics Seminar (Computer Laboratory) series.