Beating CountSketch for Heavy Hitters in Insertion Streams

by Vladimir Braverman, Stephen Chestnut Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David Woodruff

Oded's comments

Here is another work that I have alreday mentioned in my report of The Sublinear Algorithms Workshop (JHU, Jan. 2016).

This work presents another setting in which improved randomized algorithms are obtained by avoiding a straightforward union bound. Instead of reducing the error probability so to afford such a union bound, this work amplifies the gap (between the sought object and the rest). More specifically, the stream is broken adaptively to parts, which correspond to increasingly easier instances, and so the error proabbility on later parts is smaller.

The original abstract

The task of finding heavy hitters is one of the best known and well studied problems in the area of data streams. In a sense, the strongest guarantee available is the L2 guarantee, which requires finding all items that occur at least eps*||f|| times in the stream, where the i-th coordinate of the vector f is the number of occurrences of i in the stream. The first algorithm to achieve the L2 guarantee was the CountSketch (Charikar, Chen, and Farach-Colton ICALP'02), which, for constant eps, requires O(log n) words of memory and O(log n) update time. It is known to be space-optimal if the stream includes deletions.

In this talk I will discuss recent improvements that allow us to find L2 heavy hitters in O(1) memory and O(1) update time in insertion only streams. The improvements rely on a deeper understanding of the AMS sketch (Alon, Matias, and Szegedy STOC'96) and similar sketches and draw on the theory of Gaussian processes.

Based on work of Vladimir Braverman, Stephen Chestnut Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David Woodruff (in arxiv:1511.00661 and arxiv:1603.00759).


Back to list of Oded's choices.