## (an exposition of the analysis of Diakonikolas, Gouleakis, Peebles, and Price)

#### Abstract

The collision probability tester, introduced by Goldreich and Ron (ECCC, TR00-020, 2000), distinguishes the uniform distribution over $[n]$ from any distribution that is $\eps$-far from this distribution using $\poly(1/\eps)\cdot{\sqrt n}$ samples. While the original analysis established only an upper bound of $O(1/\eps)^4\cdot{\sqrt n}$ on the sample complexity, a recent analysis of Diakonikolas, Gouleakis, Peebles, and Price (ECCC, TR16-178, 2016) established the optimal upper bound of $O(1/\eps)^2\cdot{\sqrt n}$. In this note we survey their analysis, while highlighting the sources of improvement. Specifically:

• While the original analysis reduces the testing problem to approximating the collision probability of the unknown distribution up to a $1+\eps^2$ factor, the improved analysis capitalizes on the fact that the latter problem needs only be solved at the extreme'' (i.e., it suffices to distinguish the uniform distribution, which has collision probability $1/n$, from any distribution that has collision probability exceeding $(1+4\eps^2)/n$).
• While the original analysis provides an almost optimal analysis of the variance of the estimator when $\eps=\Omega(1)$, a more careful analysis yields a significantly better bound for the case of $\e=o(1)$, which is the case that is relevant here.

#### Material available on-line

• First version posted: Sept 2017.
• Revisions: none yet.

Back to either Oded Goldreich's homepage or general list of papers.