Highlights of the Bertinoro workshop on Sublinear Algorithms
(see workshop's webpage)
The talks that I wish to highlight most are:
- Optimal O(1)-time approximations of bounded-degree CSPs
by Yuichi Yoshida
- Learning and testing k-modal distributions
by Rocco Servedio (et al)
These two talks as well as 20 additional ones
(15 regular talks and 5 survey talks)
are briefly reviewed below.
For the abstracts of all talks click
HERE
My top choices
Yuichi Yoshida:
Optimal O(1)-time approximations of bounded-degree CSPs
This is a great result. It states that,
for every CSP, there exists a constant $c$
such that every bounded-degree instance (of this CSP)
can be approximated to within a factor of $c$ in O(1)-time,
but pproximation to within any constant factor $c'>c$
requires at least $sqrt(n)$ queries.
Furthermore, the constant $c$ equals the integrality gap
of the generic LP relaxation of this CSP.
(Cf., the optimality of polynomial-time approximations (under UGC)
at the integrality gap of the generic SDP relaxation.)
Rocco Servedio: Learning and testing k-modal distributions
I found the algorithmic design very inspiring.
When one talks of using testing as a preliminary step to learning,
one usually envisions using the tester to select an algorithm
out of several algorithms, each tailured for a different concept class.
In this work the idea is to decompose the target input (object),
which belongs to a complex concept class (which seems hard to learn)
into several objects that should each belong to a simpler concept class.
While such a partition exists, it is not known to the learner.
Thus, the learner generates several possible partitions
and tests them in attempt to find a good partition
(i.e., a partition in which each part belongs to the said
simpler concept class). Once a good partition is found,
each part is learned
(via the simpler learner corresponding to the basic class),
and the full object is reconsructed.
Additional choices
Krzysztof Onak:
A Near-Optimal Approximation of the Minimum Vertex Cover Size
It is known that approximating the value of the mVC requires time that
is linear in the average degree, and the current work provides an
algorithm that essentially achieves this running time.
An interesting feature of the algorithm is that it manages to
conduct a large number of recursive calls to a non-trivial procedure
while encurring a relatively small overhead (i.e., the amortized cost
is logarithmic in the number of invocations).
Sofya Raskhodnikova:
Testing and Reconstruction of Lipschitz Functions
The starting point is an appealing application of testing and
reconstruction of Lipschitz functions to privacy preserving databases,
where the key observation is that the archetypical technique for
obtaining (differential) privacy is adding noise at a level that masks
the effect of any individual entry on the query. This calls for
testing whether the query indeed has such a limited effect,
and for filtering the query such to obtain such a limited effect.
This motivates the current study, which provides partial results
(regarding cetian domains but not the most natural one).
C. Seshadhri:
Estimating the Longest Increasing Subsequence in Polylogarithmic Time
The aim is to find an estimate that is accurate to within an additive
error of $\e n$, where $n$ is the input length,
and do it in polylogarithmic time (and arbitrary dependence on $\e>0$).
This is done via "sublinear-time dynamic programming";
that is, running DP on small instances that emerge based
on adaptive sampling.
Tali Kaufman: Locally Testable Codes and Expanders
I was intrigued by Tali's observation regarding the somewhat
contradictory relation between LTC and expanders.
On the one hand, it is known [BHR] that strong expanders
(i.e., odd-neighbor expanders) yield codes that are hard to test
(i.e., are not LTCs). On the other hand, strong LTC
yield graphs that are weak expanders (i.e., small set expanders).
(Caveat: the graphs in the two cases are only related;
in the 1st case it corresponds to the parity check matrix of an LDPC,
whereas in the 2nd case it describes positions that are probed on
the same random-tape of the tester. Still...)
Alexandr Andoni:
Sublinear Algorithms via Precision Sampling
This work makes explicit a method that was used in prior works,
and may be called adaptive-presision sampling.
The idea is that obtaining an estimate of a quantity
defined over a domian that is partitioned into many sub-domains
does not require obtaining the same level of precision in estimating
this quantity over each sub-domain. Obtaining a precision that is
bigger by a factor of $2^k$ on a $2^{-k}$ fraction of the subdomains,
where these sub-domains are selected at random and the process is
repeated for (logarithmically) many values of $k$ suffices.
The authors cite some streaming algorithms as origins of this methods,
but it has appeared also in property testing
(see e.g.),
and in cryptography
(e.g., the analysis of hardcore predicates; cf. Clm 2.5.4.1
in Foundations of Crypto., Vol. 1).
Asaf Shapira:
Testing Odd-Cycle-Freeness in Boolean Functions
Here a cycle is a sequence of points in the function domain
that sum to zero and all evaluate to one.
(This is a special case of a sequence of points that reside
on a prescribed sub-space and evaluate to a certain pattern.)
The current problem is reduced to testing bipartiteness.
Pierre Fraigniaud: Local Distributed Decision Algorithms
The question here is what decisions can be made by
(synchronous) distributed algorithms (in the network model)
that use a constant number of rounds. Specifically,
the input string $x$ is distributed among the vertices
of the graph $G$ (i.e., it may be viewed as $x:V(G)\to\{0,1\}^*$),
and the requirement is that if $x$ is a YES-instance then all vertices
accept, whereas if $x$ is a NO-instance then at least one vertex rejects.
The work considers deterministic and randomized algorithms
as well as deterministic and randomized evaluations of
auxiliary certificates (i.e., vertex $v$ obatins both $x(v)$
and $w(v)$, where $w:V(G)\to\{0,1\}^*$ is a certificate for $x$).
In the pure model (w.o., certificates) randomization helps
iff the acceptance and rejection probabilities
(re YES and NO instances, resp) satisfy a simple inequality.
In the "certificated" model randomization is extremely powerful:
it allows to decide membership in any set!
Ning Xie: Local Computation Algorithms
This work presents a general framework that generalizes problems
such as local decodability and local reconstruction.
For a function $F$ from strings to sets of strings,
we seek a randomized algorithm that on explicit input $i$
and oracle access to $x$ answers with the $i$th bit of a fixed
string in $F(x)$; that is, the string $y\in F(x)$ is determined
based on the internal coin tosses of the algorithm, but is oblivious
of $i$ (and so the answers obtained for all possible values of $i$
are consistent with a single string in $F(x)$).
The problem of local deconstruction of a function $x:[N]\to\{0,1\}^*$
w.r.t the property $\Pi$ and proximity parameter $\e$ can be cast by
letting $F(x)$ equal the set of all $y\in\Pi$ that are $\e$-close to $x$.
Christian Sohler:
Every Property of Hyperfinite Graphs is Testable
This work refers to the bounded-degree graph model, and to testing
within a number of queries that is independent of the size of the
graph. A graph is called $(k,\e)$-hyperfinite if it is $\e$-close
to a graph in which all connected components are of size at most $k$.
The notion of a $k$-disc (i.e., the subgraph viewed by a BFS that is
tructaed at depth $k$) serves as a pivot of this work.
Once one implements a "partition oracle"
(vizvis the aforementioned connected components),
testing reduces to evaluating the frequency of
the various possible $k$-discs.
Oded Goldreich: Finding Cycles and Trees
(Indeed, here I violate my promise not to choose on my works,
but I get I should be excused given the context of this report.)
This work advocates the study of sub-linear time algorithms
for finding (small) substructures in graphs
that are far from lacking such substractures;
e.g., finding a cycle in a graph that is far from being cycle-free.
This problem is related to one-sided error structure,
especially when the tested property is cast as consisting
of graphs that lack some (natural) substructures.
The work begs for further study, e.g., of arbitrary minor-freeness
and of extending the study to the general graph model
(wheraes almost all the current results refer to
the bounded-degree graph model).
Oded Lachish: Testing acceptability by RO Boolean Formulea
This "massively parameterized" property is defined in term of a fixed
formula, and consists of testing whether a given assignment satisfies
this formula. The results buikd on the "independence" of the parts
of any read-once formula.
Gilad Tsur:
On Approximating the Number of Relevant Variables in a Function
This work consideres a "double relaxation" of the task of determining
the number of relevant variables of a function; specifically, one
should distinguish between functikns that depend on $k$ variables
and functions that are $\e$-far from depending on $k'$ variables,
where $k'=(1+\gamma)\cdot k$ for a constant $\gamma>0$.
The bottomline is that this double-relataxtion is not
significantly easier than the standard propery testing problem
(where $k'=k$).
Andrew McGregor:
Verifying the correct-operation of prioroty queues
Given a stream of operations to/on a data structure
(e.g., inserts and extracts to a priority queue),
the task is to check whether these operations
were performed correctly (e.g., whether each extract
resulted in the minimum elements currently in the queue).
The first observation is that fingerprinting can be
used to check whether the multisets of inserted
and deleted elements are equal.
The algorithm works by treating differently short range
versus long range violation, where the threshold is sqrt
(i.e., square root of the length of the stream).
The short range is dealt with by brute force,
whereas the long range is dealt by a clever record keeping.
Interestingly, the sqrt complexity that results is optimal.
Robi Krauthgamer:
Polylogarithmic Approximation for Edit Distance
While the edit distance can be computed in quadratic time
(by Dynamic Programming), the current work obtains a polylogarithmic
approximation in nearly-linear time.
The pivot is reducing the edit distance to a tree distance,
which allows for faster evaluation via sampling.
Specifically, this is obtained by referring to an asymmetric model
in which one string is fixed and oracle access is provided to the
other string, while keeping track of the number of queries made.
Thus, the low query complexity achieved in the latter model,
and the linear-time implementation of queries (in the standard model)
yields the desired result (in the standard model).
Artur Czumaj:
Planar Graphs - Random Walks and Bipartiteness Testing
This work presents techniques for dealing with the general graph model
(i.e., extending results from the bounded-degree model to the general
graph model).
Ronitt Rubinfeld:
Testing Properties of Collections of Distributions
Two models are considered.
In the sampling model, one gets pairs $(i,x)$,
where $i$ is selected according to some distribution
(i.e., uniform, fixed but known, or even unknown)
and $x$ is selected according to the $i$th distribution.
In the query model, in respond to the query $i$,
one gets a sample $x$ selected according to the $i$th distribution.
The question of testing the equality of distributions in the sampling
model (for an unknown distribution) coincides with testing the
independence of two parts of a distribution, but many other questions
can be asked (and some were treated in this work).
Surveys
I wish to commend the organizers for including in the program
a large number of surveys. Most of them are summarized below.
Dana Ron: Approximating graph parameters
Four types of parameters (of arbitrary graphs) were discussed
and the ideas underlying their approximations were surveyed.
- Average degrees, and the number of substructures (e.g., stars).
Here the idea is to estimate the number from all sides of the object
(where in the case of average degree the object is an edge).
- The weight of a MST (in weighted graphs).
Here the techiques build on connectivity testing
in bounded-degree graphs.
- The size of min-VC and maximial matching.
Here the initial idea is a relation to ultra-fast distributed
computing, but subsequent work relies on implementing certain
partition oracles.
- The distance to various graph properties.
In the dense graph model a general result is known,
but in the bounded-degree model the cuurently known results
are sporadic.
David Woodruff: Evaluating various norms in streaming
Given a stream of updates to a large number of values,
the task is to compute various norms of the resulting vector
(of final aggregated values). This is relatively easy in the case of
the Eucildian Norm (by relying on the fact that this norm squared equals
the expected value of the square of the inner-product of the vector
with a random $\pm1$-vector). The estimation of other norms
relies on the relation between norms and on clustering of
contributions according to their magnitude, which in turn
can be carried out by a generic algorithm.
Madhu Sudan: Affine Invariances in Property Testing
This survey presents a unified perspective at testing algebraic properties.
The pivot is the notion of affine invariances
and the postulate that algebraic properties of functions
are preserved under affine transformations of the function's domain
(similarly to the way that graph properties are preserved
under a relabelling of the vertices).
Nir Ailon: Johnson-Lindenstrauss Transform(s)
The survey started with a distinction between the distributed version
(which states that a random linear transformation from $\R^n$ to $\R^k$
preserves a single distance w.p. $1-\exp(-k)$)
and the metric embedding version
(which preserves the distances among $N$ given points).
It was shown that fast JL transformation are related to
partial derandomization of the distributed version;
e.g., rather than using a uniformly distributed linear transformation,
one may use a transformation obtained by composing a few
transformations including a sparse random transformation
(from $\R^n$ to $\R^k$) and a random diagonal transformation (of $\R^n$).
Roger Wattenhofer: Distributed Algorithms
The question addressed is obtaining MIS and approximate mVC
by (synchronous) distributed algorithms (in the network model)
that use a small number of rounds. As indicated in Dana's survey,
this problem is closely related to obtaining fast sub-linear
algorithms for these problems. The current survey started with
a simple(r) analysis of Luby's randomized MIS algorithm,
and a review of Cole-Vishkin "deterministic coin tossing".
Lower and upper bounds on the performance of deterministic and
randomized algorithms on arbitrary graphs were also presented.
Back to
list of Oded's choices.