On the optimal analysis of the collision probability tester

(an exposition of the analysis of Diakonikolas, Gouleakis, Peebles, and Price)

Webpage for an exposition by Oded Goldreich

Abstract

The collision probability tester, introduced by Goldreich and Ron (ECCC, TR00-020, 2000), distinguishes the uniform distribution over $[n]$ from any distribution that is $\eps$-far from this distribution using $\poly(1/\eps)\cdot{\sqrt n}$ samples. While the original analysis established only an upper bound of $O(1/\eps)^4\cdot{\sqrt n}$ on the sample complexity, a recent analysis of Diakonikolas, Gouleakis, Peebles, and Price (ECCC, TR16-178, 2016) established the optimal upper bound of $O(1/\eps)^2\cdot{\sqrt n}$. In this note we survey their analysis, while highlighting the sources of improvement. Specifically:

While the original analysis reduces the testing problem to approximating the collision probability of the unknown distribution up to a $1+\eps^2$ factor, the improved analysis capitalizes on the fact that the latter problem needs only be solved ``at the extreme'' (i.e., it suffices to distinguish the uniform distribution, which has collision probability $1/n$, from any distribution that has collision probability exceeding $(1+4\eps^2)/n$).
While the original analysis provides an almost optimal analysis of the variance of the estimator when $\eps=\Omega(1)$, a more careful analysis yields a significantly better bound for the case of $\e=o(1)$, which is the case that is relevant here.

Material available on-line

First version posted: Sept 2017.
Revisions: none yet.

Back to either Oded Goldreich's homepage or general list of papers.