# The Sublinear Algorithms Workshop, JHU, January 2016

## Oded's notes

The Sublinear Algorithms Workshop, which took place on January 7-9, 2016, at Johns Hopkins University, was one of the most enjoyable and inspiring workshop I have ever attended. Organized by Vladimir (Vova) Braverman, Piotr Indyk, Robert (Robi) Krauthgamer, and Sofya Raskhodnikova, it seems that its success rooted in the decision to open the workshop to whoever is interested in its contents. This led to an exceptionally high number of attendees (in comparison to a workshop), dominated by a large number of eager-to-learn graduate students, creating an atmosphere of pure interest and great engagement.

A word about the location: The JHU campus is beautiful. It has a beautiful collection of buildings, which look great both from outside and inside. For example, the public rooms at Gilman are amazing, and the auditorium in which the workshop took place is also very nice. Also a word of thanks to the careful local organization orchestrated by Vova and played out by him and his student, Nikita Ivkin.

The following collection of comments refers to some of the talks I have attended. As usual, for me, I did not attend all talks, due to the need to refresh myself from time to time. My comments are quite laconic, and one may want to look at the abstracts available from the workshop webpage.

#### Christian Sohler: Testing Cluster Structure of Graphs

I am very supportive of the idea of viewing clusters as "well connected" components of a graph, where well connectivity is viewed as a gap between the internal conductance of the component (as an induced subgraph) and the conductance of the cut defined by the component. The corresponding testing task generalizes the one of testing expansion, and consists of distinguishing the case that the graph can be k-partitioned in a way that satisfies a predetermined pair of conductance thresholds (i.e., lower bound on internal conductance and upper bound on external conductance) and the case that the graph is far from having a k-partition that satisfies a (possibly relaxed) pair of such thresholds.

#### Artur Czumaj: Testing Directed Graphs

When studying testing properties of directed graphs, in a model analogous to the bounded-degree model of undirected graphs, one should make two (orthogonal) distinctions. The first distinction is between properties of the directed graph, which are sensitive to the orientation of edges, and properties of the underlying undirected graph (i.e., the undirected graph obtained from the directed graph when ignoring the orientation of edges). The second distinction is between a model in which the tester can make queries regarding both out-going and in-coming edges and a model in which one can only make queries about outgoing edges.

This work studies the gap between these two models, showing that if a property can be tested with a constant number of queries in the bi-directional query model, then it can be tested in a sublinear number of queries in the uni-directional query model. The transformation des not preserve one-sided error probability, and this is inherent, since there are properties that have a constant-query one-sided error tester in the first model but no sublinear-query one-sided error tester in the second model.

#### Ilias Diakonikolas: Testing Structured Distributions

I like this work very much: It presents a very appealing framework for deriving testers for properties of distributions. It suggests to "flatten" the given distributions such that their L2-norm is utmost small, while preserving their distance to the property in question, and then apply a basic tester that has optimal complexity for distributions if small L2-norm. Using this transformation, one can reduce various property testing problems to the corresponding case (of small L2-norm), offering a unified and simple way of establishing many known results. I plan to use this approach when teaching the subject of testing properties of distributions.

#### Oded Goldreich: Testing Dynamic Environments

I tried to promote this new direction of research by focusing on the model and on what makes it different from standard testing problems. The tested object is an evolution of a sequence of $d$-dimensional arrays and the property in question is whether this sequence represents the evolution of a ($d$-dimensional) cellular automata according to a predetermined rule. The key aspect that makes this testing task different from testing a property of a $(d+1)$-dimensional is that the tester (as an observer) cannot "go back in time".

#### Stephen Chestnut: The space complexity of streaming sums

This works refers to the space complexity of computing the sum of a $g$-function of the frequencies (in the stream). While it is known that $g(x)x^c$ admits a polylog space algorithm if and only if $c\leq 2$, this work seems and almost achieves a characterization of all $g$'s for which such algorithms exist. One of the necessary conditions requires a growth rate that is at most (nearly) quadratic, and other conditions mandate a "reasonable" behavior of $g$.

#### Ronitt Rubinfeld: Local Algorithms for Sparse Spanning Graphs

The framework of local computational algorithms (LCAs) generalizes several types of tasks. It refers to algorithms that given oracle access to one object, captured by a function $f$, provided oracle access to a related object, denoted $g$, such that each query to $g$ can be answered by making few queries to $f$. The function $g$ need not be uniquely determined by $f$, but it should reside in a set of admissible solutions that is determined by $f$. That is, given oracle access to $f$, the algorithm such answer according to any $g$ that is in a predetermined set of valid solutions for $f$. In other words, for a predetermined relation $R$, given $f$ one should answer according to any $g$ such that $(f,g)\in R$. The algorithm is allowed to be randomized, but for every possible sequence of coin tosses $r$, all possible queries must be answered according to the same $g=g_r$. Since this is a search problem, with possibly more than one valid solution for every input, error reduction is not necessarily possible, and so it is required to specify the error probability of the reduction (in case such is allowed).

Property testing is captured as a very degenerated case in which $g$ is a single bit. But this framework is aimed at capturing cases in which $g$ is of size comparable to the side of $f$. One such case is that of "local reconstruction", where for a fixed property $\Pi$, given $f$ that is close to $\Pi$, one is required to answer according to $f'\in\Pi$ that is close to $f$. The current work focuses on finding a relatively sparse spanning subgraph of a given graph. That is, the set of valid solutions for the connected $n$-vertex graph $G$, is the set of all connected subgraph of $G$ that contain all $n$ vertices of $G$ but only $(1+\esp)n$ edges.

#### Noga Ron-Zewi: High-rate locally-correctable and locally-testable codes with sub-polynomial query complexity

Research in 1990-2010 aimed at codes (of constant relative distance) that support testers (and local decoders) of constant-query complexity and focused at maximizing their rate (which is sub-constant). It is most fitting for the current workshop to present a work that focuses on minimizing the query complexity of testing (and locally decoding) codes of constant rate. Indeed, the query complexity has been known to be sublinear, but the breakthrough is in getting below a constant power of the length. For testing, they obtain quasi-polylogarithmic (i.e., $(\log n)^{\log\log n}$) query complexity, and for local decoding the bound is $\exp(\sqrt(\log n))$.

#### Grigory Yaroslavtsev: L-p testing

While property testing focuses on the Hamming distance, viewed here as a =E2=80=9CL0-norm=E2=80=9D, a more general distance measure is defined as $\frac{\|f-g\|_p}{\|1\|_p}$, and the case of $p=1$ is indeed the most appealing one. Using relations among norms, one immediately obtain various relations among the complexity of testing with respect to different norms. The main result is that, for the L1-norm, $\e$-testing monotonicity on the line is possible in $O(1/\eps)$ queries, whereas under the Hamming distance a logarithmic (in the length of the line) complexity is required.

#### Dana Ron: Approximately counting triangles in sublinear time

When only allowing neighbor (and degree) queries, this task requires linear (in the number of vertices) many queries. Seeking sub-linear algorithms, this work also allows adjacency queries (as in the model of testing properties of general graphs). Then, the query complexity is essentially $O((n/t^{1/3})+\min(m,m^{3/2}/t))$, where $t \min(n^a,m^b)$, for $a,b\geq1$, the expression simplifies to $O(n^{1-(a/3) + m^{1.5-b}} = O(n^{2/3}+m^{1/2})$. A non-trivial warm-up refers to the case in which one seeks to improve the approximation factor from some large constant to a smaller constant, when given access also to a devise that samples edges uniformly at random and to an oracle that returns the number of triangles in which the queried edge participates.

#### David Woodruff: Beating CountSketch for Heavy Hitters in Insertion Streams

This work presents another setting in which improved randomized algorithms are obtained by avoiding a straightforward union bound. Instead of reducing the error probability so to afford such a union bound, this work reduces the gap (between the sought object and the rest) for which the problem can be solved.

#### Alexandr Andoni: Sketching and Embedding are Equivalent

Sketching is viewed as a communication problem in which each party sends a "sketch" of its input to a referee, who then distinguishes the case that the original inputs are close from the case they are far, where the "distortion" is the gap between close and far. This formulation suggests that the problem can be solved by embedding the original inputs in a low dimensional space, provided that the embedding has the adequate distortion. This work asks whether this sufficient condition is necessary, and obtains a positive answer using embedding to the $L_p$-normed space for any constant $p<1$. Obtaining such result for embedding into $L_1$-norm would resolve a known open problem in geometry. Nevertheless, the current result allows to derive lower bounds on sketching by relying on impossibility results regarding embedding (e.g., for sketching that preserves the Earth-Mover-Distance).

## The original program

See a word file.

Back to list of Oded's choices.