Oded's choices (Nr. 47)

Cuckoo Nesting: Modern Methods for Organizing Lookup Tables

a "popular talk" by Moni Naor.

Oded's summary

The directionary problem consists of storing an a priori unknown set $S$, $S\subset U$, in a manner that supports (relatively efficient) searching, insertion and deletion. The focus is on time of individual operations and memory size. Two approaches:

Search trees -- usually require $\log|S|$ time and use of pointers.
Hashing tables allow "direct access" and raise the problem of collisions. In general $Hash:U\to[r]$ for $r\approx n \eqdef |S|$.

The naive "linear probing" calls for storing $x$ in the first free location $p\geq Hash(x)$. That is, if $H(x)$ is taken we try $H(x)+1,H(x)+2$...

Cuckoo Hashing [Pagh and Rodler, 2001] is an alternative method of dealing with collisons. It uses two hashing functions, $H_1$ and $H_2$, and is best described in terms of using two tables $T_1$ and $T_2$. We store $x$ in $T_1[H_1(x)]$, and if $y$ previous resided there, then we store $y$ in $T_2[H_2(y)]$, and again this may push out a previous contents $z$ that may be stored in $T_1[H_1(z)]$ and so on...

The analysis of this method reduces to an analysis of a random sparse graph. Specifically, for fixed $S$ and $H_1,H_2$, we consider the $r$-by-$r$ bipartite graph in which there are $n$ edges such that for each $x\in S$ we have an edge $(H_1(x),H_2(x))$. Note that by the Cuckoo Hashing rule, the element $x$ must be stored in either $T_1[H_1(x)]$ or $T_2[H_2(x)]$, and so storage is possible iff every connected component in the graph has at most one simple cycle (i.e., has a number of edges that does not exceed the number of vertices in this connected component). Furthermore, the number of operations that result from "insert" is reflected in the size of these components.

De-amortization and use of queue -- see (see Arbiteman, Naor, and Segev.

Back to list of Oded's choices.