## Balls and Bins: Smaller Hash Families and Faster Evaluation

by Laura Elisa Celis, Omer Reingold, Gil Segev, and Udi Wieder

#### Oded's comments

I find it quite strange that these construction were not discovered
twenty years ago, when almost k-wise independence generators and
space-bounded generators were greatly improved and utilized.
Still, this comment is not meant to diminish this work,
but rather to say that it feels classic
(or "from the book" or "just so right and nice").
In each of the two constructions,
the authors "just" do the right thing
(which is easy to say in retrospect,
but two decades of failures to "do it right"
just illustrate the difference between retrospect and a priori).
**An afterthought (Aug 19th, 2011):**
The iterative construction method used in the 1st construction fits
nicely in the gradual improvement paradigm (surveyed in my article
Bravely, Moderately).

#### The original abstract

A fundamental fact in the analysis of randomized algorithm is
that when $n$ balls are hashed into $n$ bins independently and
uniformly at random, with high probability each bin contains at
most $O(\log n / \log \log n)$ balls. In various applications,
however, the assumption that a truly random hash function is
available is not always valid, and explicit functions are
required.

In this paper we study the size of families (or, equivalently,
the description length of their functions) that guarantee a
maximal load of $O(\log n / \log \log n)$ with high probability,
as well as the evaluation time of their functions. Whereas such
functions must be described using $\Omega( \log n)$ bits, the
best upper bound was formerly $O(\log^2 n / \log \log n)$ bits,
which is attained by $O(\log n / \log \log n)$-wise independent
functions. Traditional constructions of the latter offer an
evaluation time of $O(\log n / \log \log n)$, which according to
Siegel's lower bound [FOCS '89] can be reduced only at the cost
of significantly increasing the description length.

We construct two families that guarantee a maximal load of
$O(\log n / \log \log n)$ with high probability. Our
constructions are based on two different approaches, and exhibit
different trade-offs between the description length and the
evaluation time. The first construction shows that $O(\log n /
\log \log n)$-wise independence can in fact be replaced by
``gradually increasing independence'', resulting in functions
that are described using $O(\log n \log \log n)$ bits and
evaluated in time $O(\log n \log \log n)$. The second
construction is based on derandomization techniques for
space-bounded computations combined with a tailored construction
of a pseudorandom generator, resulting in functions that are
described using $O(\log^{3/2} n)$ bits and evaluated in time
$O(\sqrt{\log n})$. The latter can be compared to Siegel's lower
bound stating that $O(\log n / \log \log n)$-wise independent
functions that are evaluated in time $O(\sqrt{\log n})$ must be
described using $\Omega(2^{\sqrt{\log n}})$ bits

The paper is available from
http://research.microsoft.com/apps/pubs/default.aspx?id=147963

Back to
list of Oded's choices.