Settling the complexity of testing grainedness of distributions, and application to uniformity testing in the Huge Object model

by Clement Canonne, Sayantan Sen, and Qiping Yang

Oded's comments

I have been promoting the problem of determining the complexity of testing grainedness of distributions for almost a deacde now.

My own perspective is quite different from the one offered in the paper. Most importantly, I find the proof of the $Omega(n^c)$ lower bound, for any $c$ smaller than 1, provided in GR23a much more friendly and intuitive than the new one. Of course, I don't argue with the fact that it is the current proof (of the $Omega(n^c)$ lower bound) that can be adapted so to yield the desired lower bound (of $Omega(n/log n)$), but this does not void the foregoing opinion.

The conjecture that the true lower bound is $Omega(n/\log n)$ was not based on the fact that this would fit the known upper bound but rather on a general feeling about the lower bound techniques. In GR23a a concious decision was made not to rely on these techniques in their most general form, but rather use a simple to analyze special case. As pointed out explicitly at the end of the introduction, this special case could not deliver any better lower bound. Hence, I am not surprised that using the general forms (let alone extending the methods at its frontier) one can get a $Omega(n/log n)$ lower bound.

The original abstract

In this work, we study the problem of testing $m$-\emph{grainedness} of probability distributions over an $n$-element universe $\mathcal{U}$, or, equivalently, of whether a probability distribution is induced by a multiset $S\subseteq \mathcal{U}$ of size $|S|=m$. Recently, Goldreich and Ron (Computational Complexity, 2023) proved that $\Omega(n^c)$ samples are necessary for testing this property, for any $c$ smaller than $1$ and $m=\Theta(n)$. They also conjectured that $\Omega(\frac{m}{\log m})$ samples are necessary for testing this property when $m=\Theta(n)$. In this work, we positively settle this conjecture.

Using a known connection to the Distribution over Huge objects (DoHo) model introduced by Goldreich and Ron (TheoretiCS, 2023), we leverage our results to provide improved bounds for uniformity testing in the DoHo model.

See ECCC TR24-196.


Back to list of Oded's choices.