# The Sublinear Algorithms Workshop, JHU, January 2016

## Oded's notes

The Sublinear Algorithms Workshop, which took place on January 7-9, 2016,
at Johns Hopkins University, was one of the most enjoyable and inspiring
workshop I have ever attended. Organized by Vladimir (Vova) Braverman,
Piotr Indyk, Robert (Robi) Krauthgamer, and Sofya Raskhodnikova, it seems
that its success rooted in the decision to open the workshop to whoever is
interested in its contents. This led to an exceptionally high number of
attendees (in comparison to a workshop), dominated by a large number of
eager-to-learn graduate students, creating an atmosphere of pure interest
and great engagement.
A word about the location: The JHU campus is beautiful. It has a beautiful
collection of buildings, which look great both from outside and inside. For
example, the public rooms at Gilman are amazing, and the auditorium in
which the workshop took place is also very nice. Also a word of thanks to
the careful local organization orchestrated by Vova and played out by him
and his student, Nikita Ivkin.

The following collection of comments refers to some of the talks I have
attended. As usual, for me, I did not attend all talks, due to the need to
refresh myself from time to time. My comments are quite laconic, and one
may want to look at the abstracts available from the workshop webpage.

#### Christian Sohler: Testing Cluster Structure of Graphs

I am very supportive of the idea of viewing clusters as "well connected"
components of a graph, where **well connectivity** is viewed
as a gap between
the internal conductance of the component (as an induced subgraph) and the
conductance of the cut defined by the component. The corresponding testing
task generalizes the one of testing expansion, and consists of
distinguishing the case that the graph can be k-partitioned in a way that
satisfies a predetermined pair of conductance thresholds (i.e., lower bound
on internal conductance and upper bound on external conductance) and the
case that the graph is far from having a k-partition that satisfies a
(possibly relaxed) pair of such thresholds.
#### Artur Czumaj: Testing Directed Graphs

When studying testing properties of directed graphs, in a model analogous
to the bounded-degree model of undirected graphs, one should make two
(orthogonal) distinctions. The first distinction is between properties of
the directed graph, which are sensitive to the orientation of edges, and
properties of the underlying undirected graph (i.e., the undirected graph
obtained from the directed graph when ignoring the orientation of edges).
The second distinction is between a model in which the tester can make
queries regarding both out-going and in-coming edges and a model in which
one can only make queries about outgoing edges.
This work studies the gap between these two models, showing that if a
property can be tested with a constant number of queries in the
bi-directional query model, then it can be tested in a sublinear number of
queries in the uni-directional query model. The transformation des not
preserve one-sided error probability, and this is inherent, since there are
properties that have a constant-query one-sided error tester in the first
model but no sublinear-query one-sided error tester in the second model.

#### Ilias Diakonikolas: Testing Structured Distributions

I like this work very much: It presents a very appealing framework for
deriving testers for properties of distributions. It suggests to "flatten"
the given distributions such that their L2-norm is utmost small, while
preserving their distance to the property in question, and then apply a
basic tester that has optimal complexity for distributions if small
L2-norm. Using this transformation, one can reduce various property testing
problems to the corresponding case (of small L2-norm), offering a unified
and simple way of establishing many known results. I plan to use this
approach when teaching the subject of testing properties of
distributions.
#### Oded Goldreich: Testing Dynamic Environments

I tried to promote this new direction of research by focusing on the model
and on what makes it different from standard testing problems. The tested
object is an evolution of a sequence of $d$-dimensional arrays and the
property in question is whether this sequence represents the evolution of a
($d$-dimensional) cellular automata according to a predetermined rule. The
key aspect that makes this testing task different from testing a property
of a $(d+1)$-dimensional is that the tester (as an observer) cannot
"go back in time".
#### Stephen Chestnut: The space complexity of streaming sums

This works refers to the space complexity of computing the sum of a
$g$-function of the frequencies (in the stream). While it is known that
$g(x)x^c$ admits a polylog space algorithm if and only if $c\leq 2$, this
work seems and almost achieves a characterization of all $g$'s for which
such algorithms exist. One of the necessary conditions requires a growth
rate that is at most (nearly) quadratic, and other conditions mandate a
"reasonable" behavior of $g$.
#### Ronitt Rubinfeld: Local Algorithms for Sparse Spanning Graphs

The framework of local computational algorithms (LCAs) generalizes several
types of tasks. It refers to algorithms that given oracle access to one
object, captured by a function $f$, provided oracle access to a related
object, denoted $g$, such that each query to $g$ can be answered by making
few queries to $f$. The function $g$ need not be uniquely determined by
$f$, but it should reside in a set of admissible solutions that is
determined by $f$. That is, given oracle access to $f$, the algorithm such
answer according to any $g$ that is in a predetermined set of valid
solutions for $f$. In other words, for a predetermined relation $R$, given
$f$ one should answer according to any $g$ such that $(f,g)\in R$. The
algorithm is allowed to be randomized, but for every possible sequence of
coin tosses $r$, all possible queries must be answered according to the
same $g=g_r$. Since this is a search problem, with possibly more than one
valid solution for every input, error reduction is not necessarily
possible, and so it is required to specify the error probability of the
reduction (in case such is allowed).
Property testing is captured as a very degenerated case in which $g$ is a
single bit. But this framework is aimed at capturing cases in which $g$ is
of size comparable to the side of $f$. One such case is that of "local
reconstruction", where for a fixed property $\Pi$, given $f$ that is close
to $\Pi$, one is required to answer according to $f'\in\Pi$ that is close
to $f$. The current work focuses on finding a relatively sparse spanning
subgraph of a given graph. That is, the set of valid solutions for the
connected $n$-vertex graph $G$, is the set of all connected subgraph of $G$
that contain all $n$ vertices of $G$ but only $(1+\esp)n$ edges.

#### Noga Ron-Zewi: High-rate locally-correctable and locally-testable codes
with sub-polynomial query complexity

Research in 1990-2010 aimed at codes (of constant relative distance) that
support testers (and local decoders) of constant-query complexity and
focused at maximizing their rate (which is sub-constant). It is most
fitting for the current workshop to present a work that focuses on
minimizing the query complexity of testing (and locally decoding) codes of
constant rate. Indeed, the query complexity has been known to be sublinear,
but the breakthrough is in getting below a constant power of the length.
For testing, they obtain quasi-polylogarithmic (i.e., $(\log n)^{\log\log
n}$) query complexity, and for local decoding the bound is $\exp(\sqrt(\log
n))$.
#### Grigory Yaroslavtsev: L-p testing

While property testing focuses on the Hamming distance, viewed here as a
=E2=80=9CL0-norm=E2=80=9D, a more general distance measure is defined
as $\frac{\|f-g\|_p}{\|1\|_p}$, and the case of $p=1$ is indeed the
most appealing one. Using
relations among norms, one immediately obtain various relations among the
complexity of testing with respect to different norms. The main result is
that, for the L1-norm, $\e$-testing monotonicity on the line is possible in
$O(1/\eps)$ queries, whereas under the Hamming distance a logarithmic (in
the length of the line) complexity is required.
#### Dana Ron: Approximately counting triangles in sublinear time

When only allowing neighbor (and degree) queries, this task requires linear
(in the number of vertices) many queries. Seeking sub-linear algorithms,
this work also allows adjacency queries (as in the model of testing
properties of general graphs). Then, the query complexity is essentially
$O((n/t^{1/3})+\min(m,m^{3/2}/t))$, where $t \min(n^a,m^b)$, for
$a,b\geq1$, the expression simplifies to $O(n^{1-(a/3) + m^{1.5-b}} =
O(n^{2/3}+m^{1/2})$. A non-trivial warm-up refers to the case in which one
seeks to improve the approximation factor from some large constant to a
smaller constant, when given access also to a devise that samples edges
uniformly at random and to an oracle that returns the number of triangles
in which the queried edge participates.
#### David Woodruff: Beating CountSketch for Heavy Hitters in Insertion Streams

This work presents another setting in which improved randomized algorithms
are obtained by avoiding a straightforward union bound. Instead of reducing
the error probability so to afford such a union bound, this work reduces
the gap (between the sought object and the rest) for which the problem can
be solved.
#### Alexandr Andoni: Sketching and Embedding are Equivalent

Sketching is viewed as a communication problem in which each party sends a
"sketch" of its input to a referee, who then distinguishes the case that
the original inputs are close from the case they are far, where the
"distortion" is the gap between close and far. This formulation suggests
that the problem can be solved by embedding the original inputs in a low
dimensional space, provided that the embedding has the adequate
distortion. This work asks whether this sufficient condition is necessary,
and obtains a positive answer using embedding to the $L_p$-normed space for
any constant $p<1$. Obtaining such result for embedding into $L_1$-norm
would resolve a known open problem in geometry. Nevertheless, the current
result allows to derive lower bounds on sketching by relying on
impossibility results regarding embedding (e.g., for sketching that
preserves the Earth-Mover-Distance).

## The original program

See a word file.

Back to
list of Oded's choices.