On the optimal analysis of the collision probability tester
(an exposition of the analysis
of Diakonikolas, Gouleakis, Peebles, and Price)
Webpage for an exposition by Oded Goldreich
Abstract
The collision probability tester,
introduced by Goldreich and Ron (ECCC, TR00-020, 2000),
distinguishes the uniform distribution over $[n]$ from
any distribution that is $\eps$-far from this distribution
using $\poly(1/\eps)\cdot{\sqrt n}$ samples.
While the original analysis established only an
upper bound of $O(1/\eps)^4\cdot{\sqrt n}$ on the sample complexity,
a recent analysis of Diakonikolas, Gouleakis, Peebles, and Price
(ECCC, TR16-178, 2016)
established the optimal upper bound of $O(1/\eps)^2\cdot{\sqrt n}$.
In this note we survey their analysis,
while highlighting the sources of improvement. Specifically:
- While the original analysis reduces the testing problem
to approximating the collision probability of the unknown
distribution up to a $1+\eps^2$ factor,
the improved analysis capitalizes on the fact that the latter problem
needs only be solved ``at the extreme'' (i.e., it suffices to distinguish
the uniform distribution, which has collision probability $1/n$,
from any distribution that has collision probability
exceeding $(1+4\eps^2)/n$).
- While the original analysis provides an almost
optimal analysis of the variance of the estimator
when $\eps=\Omega(1)$, a more careful analysis yields
a significantly better bound for the case of $\e=o(1)$,
which is the case that is relevant here.
Material available on-line
- First version posted:
Sept 2017.
- Revisions: none yet.
Back to
either Oded Goldreich's homepage
or general list of papers.