## Verifying The Unseen:
Interactive Proofs for Label-Invariant Distribution Properties

by Tal Herman and Guy Rothblum

#### Oded's comments

I was waiting for a long time to post a recomendation for this work,
but could not do so before it has been publically posted.

This work presents a (2-message) interactive proof system of proximity
for verifying any label invariant propert of distributions over $[N]$.
The verifier is given $\tildeO({\sqrt })$ samples from the distribution
and within this communication (and time) complexity can correctly distinguish
the case that the distribution is close to the property from the case
that the distribution is far from the property.

The foregoing is quite tight, in general (as well as in many natural cases),
and should be contrasted with the complexity of testing (without the assistence of a prover),
which is almost linear in $N$.

#### The original abstract

Given i.i.d. samples from an unknown distribution over a large domain $[N]$,
approximating several basic quantities, including the distribution's support size,
its entropy, and its distance from the uniform distribution,
requires $\Theta(N / \log N)$ samples [Valiant and Valiant, STOC 2011].

Suppose, however, that we can interact with a powerful but untrusted prover,
who knows the entire distribution (or a good approximation of it).
Can we use such a prover to approximate (or rather, to approximately {\em verify})
such statistical quantities more efficiently?
We show that this is indeed the case: the support size, the entropy,
and the distance from the uniform distribution,
can all be approximately verified via a 2-message interactive proof,
where the communication complexity, the verifier's running time,
and the sample complexity are $\widetilde{O}({\sqrt{N}})$.
For all these quantities, the sample complexity is tight up to $\polylog N$ factors
(for any interactive proof, regardless of its communication complexity or verification time).

More generally, we give a tolerant interactive proof system with the above sample
and communication complexities for verifying a distribution's proximity
to any label-invariant property (any property that is invariant
to re-labeling of the elements in the distribution's support).
The verifier's running time in this more general protocol
is also $\widetilde{O}({\sqrt{N}})$, under a mild assumption
about the complexity of deciding, given a compact representation of a distribution,
whether it is in the property or far from it.

Available from
ECCC TR22-052.

Back to
list of Oded's choices.