Conditional Samples in Distribution Testing

Oded's comments

These two independent works put forward and initiate a study of testing distributions via conditional queries. Specifically, the new type of queries specify a subset of the universe (over which the distribution is defined), and is answered by a sample taken from the corresponding conditional distribution space. Such queries represent a middle ground between the standard queries that are typically used in property testing and samples that are used in testing of distributions.

The overlap between the papers is quite significant, and in most cases the results in the second paper are stronger. For further comparison, see Section 1.4 in the second paper.

The original abstracts

ECCC TR12-154: On the Power of Conditional Samples in Distribution Testing by Sourav Chakraborty, Eldar Fischer, Yonatan Goldhirsh, and Arie Matsliah

In this paper we define and examine the power of the conditional-sampling oracle in the context of distribution-property testing. The conditional-sampling oracle for a discrete distribution $\mu$ takes as input a subset $S \subset [n]$ of the domain, and outputs a random sample $i \in S$ drawn according to $\mu$, conditioned on $S$ (and independently of all prior samples). The conditional-sampling oracle is a natural generalization of the ordinary sampling oracle in which $S$ always equals $[n]$. We show that with the conditional-sampling oracle, testing uniformity, testing identity to a known distribution, and testing any label-invariant property of distributions is easier than with the ordinary sampling oracle. On the other hand, we also show that for some distribution properties the sample-complexity remains near-maximal even with conditional sampling.

ECCC TR12-155: Testing probability distributions using conditional samples by Clement Canonne, Dana Ron, and Rocco Servedio

We study a new framework for property testing of probability distributions, by considering distribution testing algorithms that have access to a conditional sampling oracle. This is an oracle that takes as input a subset $S \subseteq [N]$ of the domain $[N]$ of the unknown probability distribution $D$ and returns a draw from the conditional probability distribution $D$ restricted to $S$. This new model allows considerable flexibility in the design of distribution testing algorithms; in particular, testing algorithms in this model can be adaptive. We study a wide range of natural distribution testing problems in this new framework and some of its variants, giving both upper and lower bounds on query complexity. These problems include testing whether $D$ is the uniform distribution $U$; testing whether $D = D*$ for an explicitly provided $D*$; testing whether two unknown distributions $D1$ and $D2$ are equivalent; and estimating the variation distance between $D$ and the uniform distribution. At a high level our main finding is that the new conditional sampling framework we consider is a powerful one: while all the problems mentioned above have $Omega(\sqrt{N})$ sample complexity in the standard model (and in some cases the complexity must be almost linear in $N$), we give $poly(log(N),1/\epsilon)$) query algorithms (and in some cases $poly(1/\epsilon)$-query algorithms independent of $N$) for all these problems in our conditional sampling setting.


Back to list of Oded's choices.