## Conditional Samples in Distribution Testing

In this paper we define and examine the power of the conditional-sampling oracle in the context of distribution-property testing. The conditional-sampling oracle for a discrete distribution $\mu$ takes as input a subset $S \subset [n]$ of the domain, and outputs a random sample $i \in S$ drawn according to $\mu$, conditioned on $S$ (and independently of all prior samples). The conditional-sampling oracle is a natural generalization of the ordinary sampling oracle in which $S$ always equals $[n]$. We show that with the conditional-sampling oracle, testing uniformity, testing identity to a known distribution, and testing any label-invariant property of distributions is easier than with the ordinary sampling oracle. On the other hand, we also show that for some distribution properties the sample-complexity remains near-maximal even with conditional sampling.
We study a new framework for property testing of probability distributions, by considering distribution testing algorithms that have access to a conditional sampling oracle. This is an oracle that takes as input a subset $S \subseteq [N]$ of the domain $[N]$ of the unknown probability distribution $D$ and returns a draw from the conditional probability distribution $D$ restricted to $S$. This new model allows considerable flexibility in the design of distribution testing algorithms; in particular, testing algorithms in this model can be adaptive. We study a wide range of natural distribution testing problems in this new framework and some of its variants, giving both upper and lower bounds on query complexity. These problems include testing whether $D$ is the uniform distribution $U$; testing whether $D = D*$ for an explicitly provided $D*$; testing whether two unknown distributions $D1$ and $D2$ are equivalent; and estimating the variation distance between $D$ and the uniform distribution. At a high level our main finding is that the new conditional sampling framework we consider is a powerful one: while all the problems mentioned above have $Omega(\sqrt{N})$ sample complexity in the standard model (and in some cases the complexity must be almost linear in $N$), we give $poly(log(N),1/\epsilon)$) query algorithms (and in some cases $poly(1/\epsilon)$-query algorithms independent of $N$) for all these problems in our conditional sampling setting.