Doubly-Efficient Interactive Proofs for Distribution Properties

by Tal Herman and Guy Rothblum

Oded's comments

This work improves over a prior work of the same authors (see Choice Nr 322) in that it provides doublly-efficient interactive proof of proximity for (label-invariant) properties of distribution. Specifically, when handling distributions over $[N]$, the new proof systems maintain the verification complexities of the previous work (which are $\tildeO(N^{1/2})$), but improve the proving complexity to $\tildeO(N)$. Furthermore, as in the previous work, the proof system uses two messages. In addition, in my opinion, the current interactive proof systems and their analysis are more transparent and natural than the previous ones.

The original abstract

Suppose we have access to a small number of samples from an unknown distribution, and would like learn facts about the distribution. An untrusted data server claims to have studied the distribution and makes assertions about its properties. Can the untrusted data server prove that its assertions are approximately correct? Can a short efficiently verifiable proof be generated in polynomial time?

We study doubly-efficient interactive proof systems that can be used to verify properties of an unknown distribution over a domain $[N]$. On top of efficient verification, our focus is on proofs that the honest prover can generate in polynomial time. More generally, the complexity of generating the proof should be as close as possible to the complexity of simply running a standalone analysis to determine whether the distribution has the property.

Our main result is a new 2-message doubly-efficient interactive proof protocol for verifying any label-invariant distribution property (any property that is invariant to re-labeling of the elements in the domain of the distribution). The sample complexity, communication complexity and verifier runtime are all $\widetilde{O}({\sqrt{N}})$. The proof can be generated in quasi-linear $\widetilde{O}(N)$ time and sample complexities (the runtimes of the verifier and the honest prover hold under a mild assumption about the property's computational complexity). This improves on prior work, where constructing the proof required super-polynomial time [Herman and Rothblum, STOC 2022]. Our new proof system is directly applicable to proving (and verifying) several natural and widely-studied properties, such as a distribution's support size, its Shannon entropy, and its distance from the uniform distribution. For these (and many other) properties, the runtime and sample complexities for generating the proof are within $\text{polylog} (N)$ factors of the complexities for simply determining whether the property holds.

See ECCC TR23-161.

Back to list of Oded's choices.