Outcome Indistinguishability

by Cynthia Dwork, Michael Kim, Omer Reingold, Guy Rothblum, and Gal Yona

Oded's comments

My perpective on this work is very narrow; it ignores the original motivation for defining this notion of indistinguishability, and focuses on the notion per se.

There are several ways of introducing the foregoing notion. One is to consider distributions on pairs of the form $(x,y)$, where $x$ is an $n$-bit strong and $y$ is an $\ell$-bit string, with $\ell=1$ being the main case considered in the paper. Indeed, one may view $y$ as the outcome of a random process, denoted $R$, applied to $x$; that is, $y\sim R(x)$. (In the following definition, one may consider two random process, denote $R$ and $R'$, and let $Y=R(X)$ and $Y'=R'(X)$.)

We say that two joint distributions $Z=(X,Y)$ and $Z'=(X,Y')$, which agree on their first element (i.e., $X$), are outcome indistinguishable (wrt a class of distinguishers) if for every distinduisher $D$ (in that class) it holds that $\prob[D(X,Y)=1]$ is close to $\prob[D(X,Y')=1]$.
A natural task then is, given samples from $(X,Y)$, to form a ``genetative model'' (or a ``predictor'') $R'$ such that $(X,R'(X))$ is outcome indistinguishable from $(X,Y)$, which may be written as $(X,R(X))$. Equivalently, in case of $\ell=1$, we may consider the task of providing approximations to $p(x)\eqdef\prob[R(x)=1]$, where the approximation is such that sampling according to it (i.e., using $R'(x)=1$ with probability $p'(x)$) yields a distribution that is outcome indistinguishable from $(X,Y)$. Indeed, typically, samples of $(X,Y)$ do not (directly) yield the value of $p$, since it is unlikely to get two samples that share thae same first element (i.e., $x$). More generally, while $p:\bitset^n\to[0,1]$ is well defined, nobody (i.e., neither the predictor nor the distinguisher) has (oracle) access to it. (In case $\ell$ greater than 1, $p(x)$ is an $(2^\ell-1)$-long sequence that describes the probability distribution of $R(x)$.)

Variants on (or rather potential strengentings of) the foregoing definitions include the case in which the distinguisher is given multiple samples from each of the two distributions (i.e., $(X,Y)$ and $(X,Y')$) as well as providing the distinguisher additional information; specifically, the paper considers the following possibilities:

  1. In addition to the sample $(x,y)\gets(X,Y')$, the distinguisher also gets $p'(x)\eqdef\prob[Y'=1|X=x]$.
  2. In addition to the sample $(x,y)\gets(X,Y')$, the distinguisher gets oracle access to $p'$.
  3. In addition to the sample $(x,y)\gets(X,Y')$, the distinguisher gets a circuit computing $p'$.
While the basic model can be viewed as a restriction of the standard notion of computational indistingusihability to pairs of joint distributions of the form $(X,Y)$ and $(X,Y')$, the generalization extend the standard notion by providing the distinguisher with additional information that is related to (one of) the tested samples.

The original abstract [with a single clarifications]

Prediction algorithms assign numbers to individuals that are popularly understood as individual "probabilities" -- [like] what is the probability of 5-year survival after cancer diagnosis? -- and which increasingly form the basis for life-altering decisions. Drawing on an understanding of computational indistinguishability developed in complexity theory and cryptography, we introduce Outcome Indistinguishability. Predictors that are Outcome Indistinguishable yield a generative model for outcomes that cannot be efficiently refuted on the basis of the real-life observations produced by Nature. We investigate a hierarchy of Outcome Indistinguishability definitions, whose stringency increases with the degree to which distinguishers may access the predictor in question. Our findings reveal that Outcome Indistinguishability behaves qualitatively differently than previously studied notions of indistinguishability. First, we provide constructions at all levels of the hierarchy. Then, leveraging recently-developed machinery for proving average-case fine-grained hardness, we obtain lower bounds on the complexity of the more stringent forms of Outcome Indistinguishability. This hardness result provides the first scientific grounds for the political argument that, when inspecting algorithmic risk prediction instruments, auditors should be granted oracle access to the algorithm, not simply historical predictions.

See arXiv 2011.13426.

Back to list of Oded's choices.