## Beating CountSketch for Heavy Hitters in Insertion Streams

by Vladimir Braverman, Stephen Chestnut
Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David Woodruff

#### Oded's comments

Here is another work that
I have alreday mentioned in my report of
The Sublinear Algorithms Workshop (JHU, Jan. 2016).

This work presents another setting in which improved randomized algorithms
are obtained by avoiding a straightforward union bound. Instead of reducing
the error probability so to afford such a union bound, this work amplifies
the gap (between the sought object and the rest).
More specifically, the stream is broken adaptively to parts,
which correspond to increasingly easier instances,
and so the error proabbility on later parts is smaller.

#### The original abstract

The task of finding heavy hitters is one of the best known and well studied
problems in the area of data streams. In a sense, the strongest guarantee
available is the L2 guarantee, which requires finding all items that occur
at least eps*||f|| times in the stream, where the i-th coordinate of the
vector f is the number of occurrences of i in the stream. The first
algorithm to achieve the L2 guarantee was the CountSketch (Charikar, Chen,
and Farach-Colton ICALP'02), which, for constant eps, requires O(log n)
words of memory and O(log n) update time. It is known to be space-optimal
if the stream includes deletions.

In this talk I will discuss recent improvements that allow us to find L2
heavy hitters in O(1) memory and O(1) update time in insertion only
streams. The improvements rely on a deeper understanding of the AMS sketch
(Alon, Matias, and Szegedy STOC'96) and similar sketches and draw on the
theory of Gaussian processes.

Based on work of Vladimir Braverman, Stephen Chestnut
Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David Woodruff
(in arxiv:1511.00661 and arxiv:1603.00759).

Back to
list of Oded's choices.