Linear Hashing with $\ell_\infty$ guarantees and two-sided Kakeya bounds

by Manik Dhar and Zeev Dvir

Oded's comments

In contrast to the authors, who stress the case of a relatively large max-norm (i.e., max-norm greater than $2^{-t}$), which does not imply anything for total variation distance, I would stress the case of relatively high small max-norm. Specifically, in the main theorem, I would focus on the case of small $\tau$ (which equals $\eps$ of Thm 1.1), rather than on the case of $\tau$ greater than 1. Put in other words, in the case of small $\tau$, we can infer that, w.h.p., each image has approximately the same number of preimages. This is an extremely useful setting for many applications, where one is willing to have an entropy loss of $O(\log(n/\eps)$, rather than $2\log(1/\eps)$ as in the LHL, since in these settings $\eps$ is smaller than $1/n$ anyhow. This allows to use pairwise independent hashing functions rather than $n$-wise independent hashing function. [See, e.g., Sec and Lemma D.6 in my complexity book].

Correction (6-Apr-22): Actually, for the aforementioned application of Sec , we can also take $\tau$ to be greater than 1, provided $\tau\leq\poly(n)$.

The original abstract

We show that a randomly chosen linear map over a finite field gives a good hash function in the $\ell_\infty$ sense. More concretely, consider a set $S \subset \mathbb{F}_q^n$ and a randomly chosen linear map $L : \mathbb{F}_q^n \to \mathbb{F}_q^t$ with $q^t$ taken to be sufficiently smaller than $|S|$. Let $U_S$ denote a random variable distributed uniformly on $S$. Our main theorem shows that, with high probability over the choice of $L$, the random variable $L(U_S)$ is close to uniform in the $\ell_\infty$ norm. In other words, every element in the range $\mathbb{F}_q^t$ has about the same number of elements in $S$ mapped to it. This complements the widely-used Leftover Hash Lemma (LHL) which proves the analog statement under the statistical, or $\ell_1$, distance (for a richer class of functions) as well as prior work on the expected largest 'bucket size' in linear hash functions [ADMPT99]. Our proof leverages a connection between linear hashing and the finite field Kakeya problem and extends some of the tools developed in this area, in particular the polynomial method.

Available from ECCC TR22-047.

Back to list of Oded's choices.