Highlights of the Bertinoro workshop on Sublinear Algorithms (see workshop's webpage)

The talks that I wish to highlight most are:
  1. Optimal O(1)-time approximations of bounded-degree CSPs by Yuichi Yoshida
  2. Learning and testing k-modal distributions by Rocco Servedio (et al)
These two talks as well as 20 additional ones (15 regular talks and 5 survey talks) are briefly reviewed below. For the abstracts of all talks click HERE

My top choices

Yuichi Yoshida: Optimal O(1)-time approximations of bounded-degree CSPs

This is a great result. It states that, for every CSP, there exists a constant $c$ such that every bounded-degree instance (of this CSP) can be approximated to within a factor of $c$ in O(1)-time, but pproximation to within any constant factor $c'>c$ requires at least $sqrt(n)$ queries. Furthermore, the constant $c$ equals the integrality gap of the generic LP relaxation of this CSP. (Cf., the optimality of polynomial-time approximations (under UGC) at the integrality gap of the generic SDP relaxation.)

Rocco Servedio: Learning and testing k-modal distributions

I found the algorithmic design very inspiring. When one talks of using testing as a preliminary step to learning, one usually envisions using the tester to select an algorithm out of several algorithms, each tailured for a different concept class. In this work the idea is to decompose the target input (object), which belongs to a complex concept class (which seems hard to learn) into several objects that should each belong to a simpler concept class. While such a partition exists, it is not known to the learner. Thus, the learner generates several possible partitions and tests them in attempt to find a good partition (i.e., a partition in which each part belongs to the said simpler concept class). Once a good partition is found, each part is learned (via the simpler learner corresponding to the basic class), and the full object is reconsructed.

Additional choices

Krzysztof Onak: A Near-Optimal Approximation of the Minimum Vertex Cover Size

It is known that approximating the value of the mVC requires time that is linear in the average degree, and the current work provides an algorithm that essentially achieves this running time. An interesting feature of the algorithm is that it manages to conduct a large number of recursive calls to a non-trivial procedure while encurring a relatively small overhead (i.e., the amortized cost is logarithmic in the number of invocations).

Sofya Raskhodnikova: Testing and Reconstruction of Lipschitz Functions

The starting point is an appealing application of testing and reconstruction of Lipschitz functions to privacy preserving databases, where the key observation is that the archetypical technique for obtaining (differential) privacy is adding noise at a level that masks the effect of any individual entry on the query. This calls for testing whether the query indeed has such a limited effect, and for filtering the query such to obtain such a limited effect. This motivates the current study, which provides partial results (regarding cetian domains but not the most natural one).

C. Seshadhri: Estimating the Longest Increasing Subsequence in Polylogarithmic Time

The aim is to find an estimate that is accurate to within an additive error of $\e n$, where $n$ is the input length, and do it in polylogarithmic time (and arbitrary dependence on $\e>0$). This is done via "sublinear-time dynamic programming"; that is, running DP on small instances that emerge based on adaptive sampling.

Tali Kaufman: Locally Testable Codes and Expanders

I was intrigued by Tali's observation regarding the somewhat contradictory relation between LTC and expanders. On the one hand, it is known [BHR] that strong expanders (i.e., odd-neighbor expanders) yield codes that are hard to test (i.e., are not LTCs). On the other hand, strong LTC yield graphs that are weak expanders (i.e., small set expanders). (Caveat: the graphs in the two cases are only related; in the 1st case it corresponds to the parity check matrix of an LDPC, whereas in the 2nd case it describes positions that are probed on the same random-tape of the tester. Still...)

Alexandr Andoni: Sublinear Algorithms via Precision Sampling

This work makes explicit a method that was used in prior works, and may be called adaptive-presision sampling. The idea is that obtaining an estimate of a quantity defined over a domian that is partitioned into many sub-domains does not require obtaining the same level of precision in estimating this quantity over each sub-domain. Obtaining a precision that is bigger by a factor of $2^k$ on a $2^{-k}$ fraction of the subdomains, where these sub-domains are selected at random and the process is repeated for (logarithmically) many values of $k$ suffices. The authors cite some streaming algorithms as origins of this methods, but it has appeared also in property testing (see e.g.), and in cryptography (e.g., the analysis of hardcore predicates; cf. Clm 2.5.4.1 in Foundations of Crypto., Vol. 1).

Asaf Shapira: Testing Odd-Cycle-Freeness in Boolean Functions

Here a cycle is a sequence of points in the function domain that sum to zero and all evaluate to one. (This is a special case of a sequence of points that reside on a prescribed sub-space and evaluate to a certain pattern.) The current problem is reduced to testing bipartiteness.

Pierre Fraigniaud: Local Distributed Decision Algorithms

The question here is what decisions can be made by (synchronous) distributed algorithms (in the network model) that use a constant number of rounds. Specifically, the input string $x$ is distributed among the vertices of the graph $G$ (i.e., it may be viewed as $x:V(G)\to\{0,1\}^*$), and the requirement is that if $x$ is a YES-instance then all vertices accept, whereas if $x$ is a NO-instance then at least one vertex rejects. The work considers deterministic and randomized algorithms as well as deterministic and randomized evaluations of auxiliary certificates (i.e., vertex $v$ obatins both $x(v)$ and $w(v)$, where $w:V(G)\to\{0,1\}^*$ is a certificate for $x$). In the pure model (w.o., certificates) randomization helps iff the acceptance and rejection probabilities (re YES and NO instances, resp) satisfy a simple inequality. In the "certificated" model randomization is extremely powerful: it allows to decide membership in any set!

Ning Xie: Local Computation Algorithms

This work presents a general framework that generalizes problems such as local decodability and local reconstruction. For a function $F$ from strings to sets of strings, we seek a randomized algorithm that on explicit input $i$ and oracle access to $x$ answers with the $i$th bit of a fixed string in $F(x)$; that is, the string $y\in F(x)$ is determined based on the internal coin tosses of the algorithm, but is oblivious of $i$ (and so the answers obtained for all possible values of $i$ are consistent with a single string in $F(x)$). The problem of local deconstruction of a function $x:[N]\to\{0,1\}^*$ w.r.t the property $\Pi$ and proximity parameter $\e$ can be cast by letting $F(x)$ equal the set of all $y\in\Pi$ that are $\e$-close to $x$.

Christian Sohler: Every Property of Hyperfinite Graphs is Testable

This work refers to the bounded-degree graph model, and to testing within a number of queries that is independent of the size of the graph. A graph is called $(k,\e)$-hyperfinite if it is $\e$-close to a graph in which all connected components are of size at most $k$. The notion of a $k$-disc (i.e., the subgraph viewed by a BFS that is tructaed at depth $k$) serves as a pivot of this work. Once one implements a "partition oracle" (vizvis the aforementioned connected components), testing reduces to evaluating the frequency of the various possible $k$-discs.

Oded Goldreich: Finding Cycles and Trees

(Indeed, here I violate my promise not to choose on my works, but I get I should be excused given the context of this report.)
This work advocates the study of sub-linear time algorithms for finding (small) substructures in graphs that are far from lacking such substractures; e.g., finding a cycle in a graph that is far from being cycle-free. This problem is related to one-sided error structure, especially when the tested property is cast as consisting of graphs that lack some (natural) substructures. The work begs for further study, e.g., of arbitrary minor-freeness and of extending the study to the general graph model (wheraes almost all the current results refer to the bounded-degree graph model).

Oded Lachish: Testing acceptability by RO Boolean Formulea

This "massively parameterized" property is defined in term of a fixed formula, and consists of testing whether a given assignment satisfies this formula. The results buikd on the "independence" of the parts of any read-once formula.

Gilad Tsur: On Approximating the Number of Relevant Variables in a Function

This work consideres a "double relaxation" of the task of determining the number of relevant variables of a function; specifically, one should distinguish between functikns that depend on $k$ variables and functions that are $\e$-far from depending on $k'$ variables, where $k'=(1+\gamma)\cdot k$ for a constant $\gamma>0$. The bottomline is that this double-relataxtion is not significantly easier than the standard propery testing problem (where $k'=k$).

Andrew McGregor: Verifying the correct-operation of prioroty queues

Given a stream of operations to/on a data structure (e.g., inserts and extracts to a priority queue), the task is to check whether these operations were performed correctly (e.g., whether each extract resulted in the minimum elements currently in the queue). The first observation is that fingerprinting can be used to check whether the multisets of inserted and deleted elements are equal. The algorithm works by treating differently short range versus long range violation, where the threshold is sqrt (i.e., square root of the length of the stream). The short range is dealt with by brute force, whereas the long range is dealt by a clever record keeping. Interestingly, the sqrt complexity that results is optimal.

Robi Krauthgamer: Polylogarithmic Approximation for Edit Distance

While the edit distance can be computed in quadratic time (by Dynamic Programming), the current work obtains a polylogarithmic approximation in nearly-linear time. The pivot is reducing the edit distance to a tree distance, which allows for faster evaluation via sampling. Specifically, this is obtained by referring to an asymmetric model in which one string is fixed and oracle access is provided to the other string, while keeping track of the number of queries made. Thus, the low query complexity achieved in the latter model, and the linear-time implementation of queries (in the standard model) yields the desired result (in the standard model).

Artur Czumaj: Planar Graphs - Random Walks and Bipartiteness Testing

This work presents techniques for dealing with the general graph model (i.e., extending results from the bounded-degree model to the general graph model).

Ronitt Rubinfeld: Testing Properties of Collections of Distributions

Two models are considered. In the sampling model, one gets pairs $(i,x)$, where $i$ is selected according to some distribution (i.e., uniform, fixed but known, or even unknown) and $x$ is selected according to the $i$th distribution. In the query model, in respond to the query $i$, one gets a sample $x$ selected according to the $i$th distribution. The question of testing the equality of distributions in the sampling model (for an unknown distribution) coincides with testing the independence of two parts of a distribution, but many other questions can be asked (and some were treated in this work).

Surveys

I wish to commend the organizers for including in the program a large number of surveys. Most of them are summarized below.

Dana Ron: Approximating graph parameters

Four types of parameters (of arbitrary graphs) were discussed and the ideas underlying their approximations were surveyed.
  1. Average degrees, and the number of substructures (e.g., stars). Here the idea is to estimate the number from all sides of the object (where in the case of average degree the object is an edge).
  2. The weight of a MST (in weighted graphs). Here the techiques build on connectivity testing in bounded-degree graphs.
  3. The size of min-VC and maximial matching. Here the initial idea is a relation to ultra-fast distributed computing, but subsequent work relies on implementing certain partition oracles.
  4. The distance to various graph properties. In the dense graph model a general result is known, but in the bounded-degree model the cuurently known results are sporadic.

David Woodruff: Evaluating various norms in streaming

Given a stream of updates to a large number of values, the task is to compute various norms of the resulting vector (of final aggregated values). This is relatively easy in the case of the Eucildian Norm (by relying on the fact that this norm squared equals the expected value of the square of the inner-product of the vector with a random $\pm1$-vector). The estimation of other norms relies on the relation between norms and on clustering of contributions according to their magnitude, which in turn can be carried out by a generic algorithm.

Madhu Sudan: Affine Invariances in Property Testing

This survey presents a unified perspective at testing algebraic properties. The pivot is the notion of affine invariances and the postulate that algebraic properties of functions are preserved under affine transformations of the function's domain (similarly to the way that graph properties are preserved under a relabelling of the vertices).

Nir Ailon: Johnson-Lindenstrauss Transform(s)

The survey started with a distinction between the distributed version (which states that a random linear transformation from $\R^n$ to $\R^k$ preserves a single distance w.p. $1-\exp(-k)$) and the metric embedding version (which preserves the distances among $N$ given points). It was shown that fast JL transformation are related to partial derandomization of the distributed version; e.g., rather than using a uniformly distributed linear transformation, one may use a transformation obtained by composing a few transformations including a sparse random transformation (from $\R^n$ to $\R^k$) and a random diagonal transformation (of $\R^n$).

Roger Wattenhofer: Distributed Algorithms

The question addressed is obtaining MIS and approximate mVC by (synchronous) distributed algorithms (in the network model) that use a small number of rounds. As indicated in Dana's survey, this problem is closely related to obtaining fast sub-linear algorithms for these problems. The current survey started with a simple(r) analysis of Luby's randomized MIS algorithm, and a review of Cole-Vishkin "deterministic coin tossing". Lower and upper bounds on the performance of deterministic and randomized algorithms on arbitrary graphs were also presented.


Back to list of Oded's choices.