## On the optimal analysis of the collision probability tester

## (an exposition of the analysis
of Diakonikolas, Gouleakis, Peebles, and Price)

#### Webpage for an exposition by Oded Goldreich

#### Abstract

The collision probability tester,
introduced by Goldreich and Ron (ECCC, TR00-020, 2000),
distinguishes the uniform distribution over $[n]$ from
any distribution that is $\eps$-far from this distribution
using $\poly(1/\eps)\cdot{\sqrt n}$ samples.
While the original analysis established only an
upper bound of $O(1/\eps)^4\cdot{\sqrt n}$ on the sample complexity,
a recent analysis of Diakonikolas, Gouleakis, Peebles, and Price
(ECCC, TR16-178, 2016)
established the optimal upper bound of $O(1/\eps)^2\cdot{\sqrt n}$.
In this note we survey their analysis,
while highlighting the sources of improvement. Specifically:

- While the original analysis reduces the testing problem
to approximating the collision probability of the unknown
distribution up to a $1+\eps^2$ factor,
the improved analysis capitalizes on the fact that the latter problem
needs only be solved ``at the extreme'' (i.e., it suffices to distinguish
the uniform distribution, which has collision probability $1/n$,
from any distribution that has collision probability
exceeding $(1+4\eps^2)/n$).
- While the original analysis provides an almost
optimal analysis of the variance of the estimator
when $\eps=\Omega(1)$, a more careful analysis yields
a significantly better bound for the case of $\e=o(1)$,
which is the case that is relevant here.

#### Material available on-line

- First version posted:
Sept 2017.
- Revisions: none yet.

Back to
either Oded Goldreich's homepage
or general list of papers.