Workshop on Sublinear Algorithms
Bertinoro International Center for informatics
May 23 - 27, 2011
*** ABSTRACTS OF TALKS (SORTED BY SPEAKERS' NAMES) ***
======================================================
Nir Ailon (Technion)
Johnson-Lindenstrauss Transform(s)
The Johnson-Lindenstrauss technique has been for many years used
as an algorithmic black-box for trading off dimensionality (and
its curse) with approximation. More recently there have been
attempts to open this black box and improve it as an algorithm in
its own right. Much work has been devoted to reducing both time
and randomness. In this tutorial I will survey work on the time
complexity aspects, and connect it with other important fields
such as compressed sensing.
Alexandr Andoni (Microsoft Research Silicon Valley)
Sublinear Algorithms via Precision Sampling
Suppose we want to estimate a sum of bounded reals a1+a2+…+an with
access to only some "limited information" about ai's. A classical
setting is where we estimate the entire sum by knowing only a
random subset of ai's. Naturally, there is a trade-off between
the size of the subset and the resulting approximation.
Motivated by applications where this trade-off is not good
enough, we introduce Precision Sampling, which is an estimation
technique that uses a more general kind of "limited information"
about ai's. Instead of obtaining a subset of ai's as above, here
we obtain a rough estimate for each of the n quantities, up to
various prescribed degrees of "precision" (approximation). The
trade-off then becomes between the precision of the estimates of
ai's and the resulting approximation to the total sum. We show
one can obtain a trade-off that is qualitatively better in the
precision sampling setting than in the aforementioned (vanilla
sampling) setting.
Our resulting tool leads to new sublinear algorithms. In one
instance, it gives a simplified algorithm for a class of
streaming problems. In another instance, the tool plays an
instrumental role in a query-efficient algorithm for estimating
edit distance.
Joint work with Robi Krauthgamer and Krzysztof Onak.
Arnab Bhattacharyya (MIT)
A Unified Framework for Testing Linear-Invariant Properties
In a sequence of recent papers, Sudan and coauthors have
investigated the relation between testability of properties of
Boolean functions and the invariance of the properties with
respect to transformations of the domain. Linear-invariance is
arguably the most common such symmetry for natural properties of
Boolean functions on the hypercube. Hence, it is an important
goal to find necessary and sufficient conditions for testability
of linear-invariant properties. We obtain the following results:
1. We show that every linear-invariant property that can be
characterized by forbidding induced solutions to a (possibly infinite)
set of linear equations can be tested with one-sided error.
2. We show that every linear-invariant property that can be
tested with one-sided error can be characterized by forbidding
induced solutions to a (possibly infinite) set of systems of
linear equations.
We conjecture that our result from item (1) can be extended to
cover systems of linear equations. We further show that the
validity of this conjecture would have the following implications:
1. It would imply that every linear-invariant property that is
closed under restrictions to linear subspaces is testable with
one-sided error. Such a result would unify several previous
results on testing Boolean functions, such as the results on
testing low-degree polynomials and results on testing Fourier
dimensionality.
2. It would imply that a linear-invariant property P is testable
with one-sided error *if and only if* P is closed under
restrictions to linear subspaces, thus resolving Sudan's problem.
Joint work with Elena Grigorescu and Asaf Shapira.
Vladimir Braverman (UCLA)
Zero-One Frequency Laws
Given a function G, we want to compute a statistic of the form Si
in [n] G(mi), where mi are entries of the frequency vector
defined by a data stream.
In their fundamental paper, Alon, Matias and Szegedy asked for
which functions G it is possible to approximate Si in [n] G(mi)
in a single pass over the data and using poly-logarithmic memory.
We give a precise characterization (in fact, a zero-one law) for
all monotonically increasing functions on frequencies that are
zero at the origin. Also, I will introduce the method of
recursive sketching that can be used, e.g., for approximating
frequency moments.
Joint work with Rafail Ostrovsky.
Amit Chakrabarti (Dartmouth)
Gap-Hamming-Distance: the Journey to an Optimal Lower Bound
In the Gap-Hamming-Distance problem (GHD), Alice and Bob receive
n-bit strings and must decide whether they are "close" (Hamming
distance \leq n/2-n1/2) or "far" (distance \geq n/2+n1/2). How
many bits must they communicate to solve this problem, with error
at most 1/3? Since being proposed in the early 2000s by Indyk and
Woodruff, for its applications to space lower bounds for data
stream algorithms, the problem has been extensively studied.
In this talk, I shall describe the journey from the early results
of Indyk and Woodruff, to stronger intermediate results obtained
through a deeper understanding of the problem, and finally to a
recent optimal lower bound due to myself and Oded Regev. We now
know that any GHD protocol requires Omega(n) communication, which
completely settles the problem.
Two simplifications of our proof have appeared within the last
two months, and I shall touch upon these as well.
Graham Cormode (AT&T Research)
Mergeable Summaries
In dealing with massive distributed data, exact computations may
be possible but can still be very expensive to perform. Instead,
it is often desirable to look to more lightweight computations
that give (guaranteed) approximations. A central approach is the
idea of the mergeable summary: a compact data structure that
summarizes a data set, which can be merged with others to
summarize the union of the data without significant growth in
size. This framework enables parallel computation.
Samples and sketches are two examples of well-known, effective
mergeable summaries. I'll present recent results on new, compact
mergeable summaries for various tasks, such as approximating
frequent items, order statistics, and range counts.
Joint work with Pankaj Agarwal, Zengfeng Huang, Jeff Phillips,
Zhewei Wei and Ke Yi.
Artur Czumaj (University of Warwick)
Planar Graphs: Random Walks and Bipartiteness Testing
We study the testability of hereditary properties in arbitrary
planar graphs. Our main result is a constant time testing of
bipartiteness in arbitrary planar graphs. The previous bound for
this class of graphs was O(n1/2), and the constant-time
testability was only known for planar graphs with bounded degree.
Previously used transformations of unbounded-degree sparse graphs
into bounded-degree sparse graphs cannot be used to reduce the
problem to the testability of bounded-degree planar graphs. Our
approach extends to arbitrary minor-free graphs.
Our algorithm is based on random walks. The challenge here is to
analyze random walks for a class of graphs that has good
separators, i.e., bad expansion. Standard techniques that use a
fast convergence to a uniform distribution do not work in this
case. Informally, our analysis technique self-reduces the problem
of finding an odd length cycle in a multigraph G induced by a
collection of cycles to another multigraph G’ induced by a set of
shorter odd-length cycles, in such a way that when a random walks
finds a cycle in G’ with probability p, then it does so with
probability f(p) in G. This reduction is applied until the cycles
collapse to selfloops that can be easily detected.
Joint work with Morteza Monemizadeh, Krzysztof Onak, and
Christian Sohler.
Pierre Fraigniaud (LIAFA, Paris)
Local Distributed Decision
Inspired by sequential complexity theory, we focus on a
complexity theory for distributed decision problems. A central
theme in distributed network algorithms concerns understanding
and coping with the issue of locality. In this context, solving a
decision problem requires the processors to independently inspect
their local neighborhoods and then collectively decide whether a
given global input instance belongs to some specified language.
We consider the standard LOCAL model of computation and define
LD(t) (for local decision) as the class of decision problems that
can be solved in t communication rounds. We first study the
question of whether randomization helps in local distributed
computing, and to what extent. Specifically, we define the
corresponding randomized class BPLD(t,p,q), containing all
languages for which there exists a randomized algorithm that runs
in t rounds, accepts correct instances with probability at least
p and rejects incorrect ones with probability at least q. We show
that p^2+q=1 is a threshold for the containment of LD(t) in
BPLD(t,p,q). More precisely, we show that there exists a language
that does not belong to LD(t) for any t=o(n) but does belong to
BPLD(0,p,q) for any p,q in (0,1] such that p^2+q \leq1. On the
other hand, we show that, restricted to hereditary languages,
BPLD(t,p,q) = LD(O(t)), for any function t and any p, q in (0,1]
such that p^2+q>1. In addition, we investigate the impact of
non-determinism on local decision, and establish some structural
results inspired by classical computational complexity theory.
Specifically, we show that non-determinism does help, but that
this help is limited, as there exist languages that cannot be
decided non-deterministically. Perhaps surprisingly, it turns out
that it is the combination of randomization with non-determinism
that enables to decide all languages in constant time. Finally,
we introduce the notion of local reduction, and establish some
completeness results.
Joint work with Amos Korman and David Peleg.
Oded Goldreich (Weizmann)
Finding Cycles and Trees in Sublinear Time
We present sublinear-time (randomized) algorithms for finding
simple cycles of length at least $k \geq 3$ and tree-minors in
bounded-degree graphs. The complexity of these algorithms is
related to the distance of the graph from being Ck-minor free
(resp., free from having the corresponding tree-minor). In
particular, if the graph is far (i.e., Omega(1)-far) from being
cycle-free, then the algorithm finds a cycle of polylogarithmic
length in time $\tilde{O}(\sqrt{N})$, where N denotes the number
of vertices. This time complexity is optimal up to
polylogarithmic factors.
The foregoing results are the outcome of our study of the
complexity of one-sided error testers in the bounded-degree
graphs model. For example, we show that cycle-freeness of
N-vertex graphs can be tested with one-sided error within time
complexity $\tilde{O}(poly(1/e)\sqrt{N})$. This matches the known
Omega(\sqrt{N}) query lower bound, and contrasts with the fact that
any minor-free property admits a two-sided error tester of query
complexity that only depends on the proximity parameter e. For
any constant $k \geq 3$, we extend this result to testing whether
the input graph has a simple cycle of length at least k. On the
other hand, for any fixed tree T, we show that T-minor freeness
has a one-sided error tester of query complexity that only
depends on the proximity parameter e.
Our algorithm for finding cycles in bounded-degree graphs extends
to general graphs, where distances are measured with respect to
the actual number of edges. Such an extension is not possible
with respect to finding tree-minors in $o(\sqrt{N})$ complexity.
This is joint work with Artur Czumaj, Dana Ron, C. Seshadhri,
Asaf Shapira, and Christian Sohler.
Nir Halman (Hebrew University of Jerusalem and MIT)
FPTASs for Stochastic Optimization Problems
We present a framework for obtaining Fully Polynomial Time
Approximation Schemes (FPTASs) for stochastic optimization
problems that run in sublinear time. The functions in our model
are either convex or monotone and are over discrete domains.
The framework is based on approximating univariate functions by
piecewise linear functions and monitoring the propagation of
errors.
This talk is based on several joint works with (subsets of) Diego
Klabjan (Northwestern), Chung-Lun Li (The Hong Kong Polytechnic
University), Jim Orlin (MIT) and David Simchi-Levi (MIT).
Piotr Indyk (MIT)
On the Power of Adaptivity in Sparse Recovery
(-)
Tali Kaufman (Bar-Ilan University and Weizmann Institute)
Locally Testable Codes and Expanders
Expanders and Error correcting codes are two celebrated notions,
coupled together, e.g. in the seminal work on Expander Codes by
Sipser and Spielman that yields some of the best codes. In recent
years there is a growing interest in codes admitting algorithms
that run in *constant* time. E.g. codes for which it is possible
to distinguish in constant time between codewords and vectors far
from the code. Such codes are known as locally testable codes
(LTCs) and they are intimately related to the PCP Theorem.
It is known that Expander Codes based on *best* expanders can NOT
yield LTCs, so it seems that expansion is somewhat contradicting
to the notion of LTCs. We show that, nevertheless, the graph
associated with an LTC *must* be essentially an expander. Namely,
every LTC can be decomposed into a constant number of "basic"
codes whose associated graph is an expander.
Based on a joint work with Irit Dinur.
Robert Krauthgamer (Weizmann Institute)
Polylogarithmic Approximation for Edit Distance and the
Asymmetric Query Complexity
We present a near-linear time algorithm that approximates the
edit distance between two strings within a significantly better
factor than previously known. This result emerges in our
investigation of edit distance from a new perspective, namely a
model of asymmetric queries, for which we present near-tight bounds.
Another consequence of this new model is the first rigorous
separation between edit distance and Ulam distance, by tracing
the hardness of edit distance to phenomena that were not used by
previous analyses.
Joint work with Alexandr Andoni and Krzysztof Onak.
Oded Lachish (Birkbeck University of London)
Testing a Language Accepted by a Fixed Boolean Formula
For a given formula p we define SAT(p) to be the set of all
assignments that satisfy p. We study the query complexity of
testing and estimating SAT(p), in a model where the testers have
absolute knowledge of p. Our main results are.
· Testing
o If SAT(p) is binary and read-once, then the query complexity is
at most quasipolynomial in 1/e
o If SAT(p) is binary, monotone and read-k-times then the query
complexity is at most quasipolynomial in 1/e and k
· Estimating
o If SAT(p) is binary, monotone, read-once, then the query
complexity is at most quasipolynomial in 1/e
Michael W. Mahoney (Stanford University)
Fast Approximation of Matrix Coherence
The concept of matrix coherence measures the extent to which the
singular vectors of a matrix are correlated with the standard
basis, and as such it is of interest in the recently-popular
problem of matrix completion. A more refined concept is that of
statistical leverage. Statistical leverage is usually quantified
by the diagonal elements of the projection matrix onto the best
rank-k approximation to a matrix. Thus, it is useful in
large-scale diagnostic data analysis for identifying outliers in
large data matrices and data graphs. Moreover, it is the key
structural nonuniformity that must be dealt with (i.e., either
rapidly approximated or rapidly uniformized at the preprocessing
step) in developing fast random sampling and random projection
algorithms for matrix problems such as least-squares regression
and low-rank matrix approximation.
As a corollary of our main result, we obtain an algorithm to
approximate the coherence of an arbitrary matrix in time
qualitatively faster than the naive algorithm. Our main result is
a randomized algorithm that takes as input an arbitrary m x n
matrix A and a rank parameter k; it returns as out output a
relative-error approximation to every diagonal element of the
projection matrix onto the best rank-k approximation to A; and it
runs in time qualitatively faster than the time to compute a
basis for that space.
In particular, given an m x n matrix A, with m>>n, the algorithm
returns relative-error approximations to all the diagonal
elements of the projection matrix onto the left singular subspace
in roughly O(m n log n) time, as opposed to O(mn2) time required
by the naive algorithm. In addition, numerical implementations
run faster than traditional deterministic algorithms for matrices
as small as hundreds by thousands.
Our analysis boils down to computing a relative-error
approximation to an underconstrained least-squares approximation,
or relatedly it can be viewed as an application of
Johnson-Lindenstrauss ideas.
Andrew McGregor (University of Massachusetts, Amherst)
Data Streams, Dyck Languages, and Detecting Dubious Data Structures
In this talk, we present algorithms and lower bounds for
recognizing various languages in the data stream model. In
particular, we resolve an open problem of Magniez, Mathieu and
Nayak [STOC, 2010] concerning the multi-pass complexity of
recognizing Dyck languages. This results in a natural separation
between the standard multi-pass model and the multi-pass model
that permits reverse passes. We also present the first passive
memory checkers that verify the interaction transcripts of
priority queues, stacks, and double-ended queues. We obtain tight
upper and lower bounds for these problems, thereby addressing an
important sub-class of the memory checking framework of Blum et
al. [Algorithmica, 1994]. A key contribution of our work is a new
bound on the information complexity of AUGMENTED-INDEX.
Includes joint work with Amit Chakrabarti, Graham Cormode, and
Ranganath Kondapally.
Jelani Nelson (MIT)
Sparse Johnson-Lindenstrauss Transforms
The Johnson-Lindenstrauss (JL) lemma (1984) states that any n
points in d-dimensional Euclidean space can be embedded into
k=O(log n/e2) dimensions so that all pairwise distances are
preserved up to 1+/-e. Furthermore, this embedding can be
achieved via a linear mapping. The JL lemma is a useful tool for
speeding up solutions to several high-dimensional problems:
closest pair, nearest neighbor, diameter, minimum spanning tree,
etc. It also speeds up some clustering and string processing
algorithms, reduces the amount of storage required to store a
dataset, and can be used to reduce memory required for numerical
linear algebra problems such as linear regression and low rank
approximation.
The original proofs of the JL lemma let the linear mapping be
specified by a random dense k x d matrix (e.g. i.i.d. Gaussian
entries). Thus, performing an embedding requires dense
matrix-vector multiplication. There has been much recent work on
developing distributions that allow for embedding vectors
quickly, begun by the work of [Ailon, Chazelle, STOC 2006].
Unfortunately, these works cannot take
advantage of sparsity of the vector to be embedded, and they take
Omega(d) time to embed a vector even with only one non-zero entry.
This feature is particularly unfortunate in streaming
applications, where updates can be seen as 1-sparse vectors.
One way to speed up embeddings for sparse vectors is to develop
distributions over linear mappings whose corresponding matrices
themselves are sparse. In this talk, we give two JL
distributions with the sparsest known matrices. In fact, these
are the first distributions where every matrix in their support
has only o(1) of its entries non-zero for all settings of e and n
(specifically, only an O(e)-fraction of entries in each column
are non-zero).
This is based on joint work with Daniel Kane (Harvard University).
Krzysztof Onak (CMU)
A Near-Optimal Sublinear-Time Algorithm for Approximating the
Minimum Vertex Cover Size
I will present a near-optimal sublinear-time algorithm for
approximating the size of a minimum vertex cover. Let VC denote
this quantity for the input graph. Our algorithm computes an
estimate alpha such that VC \leq a \leq 2VC + en, where e is a
parameter to the algorithm. The query complexity and running time
of the algorithm are \tilde{O}(d’ poly(1/e)), where d’ denotes the
average vertex degree in the graph.
Joint work with Dana Ron, Michal Rosen, and Ronitt Rubinfeld.
Ely Porat (Bar-Ilan University)
Multi-pattern Search in the Streaming Model
(-)
Sofya Raskhodnikova (Pennsylvania State University)
Testing and Reconstruction of Lipschitz Functions
A function f : D -> R has Lipschitz constant c if dR(f(x), f(y))
<= c dD(x, y) for all x, y in D, where dR and dD denote the
distance functions on the range and domain of f, respectively. We
say a function is Lipschitz if it has Lipschitz constant 1. (Note
that rescaling by a factor of 1/c converts a function with a
Lipschitz constant c into a Lipschitz function.) In other words,
Lipschitz functions are not very sensitive to small changes in
the input. We initiate the study of testing and local
reconstruction of the Lipschitz property of functions.
We consider functions over domains {0,1}d, {1,...,n} and
{1,...,n}d, equipped with L1 distance. We design efficient
testers of the Lipschitz property for functions of the form
f:{0,1}d -> d Z, where d is in (0,1] and dZ is the set of
multiples of d, and also of the form f: {1,...,n} -> R, where R
is (discretely) metrically convex. In the first case, the tester
runs in time O(d min{d,r}/de), where r is the diameter of the
image of f; in the second, in time O((\log n)/e). We give
corresponding lower bounds of Omega(d) and Omega(log n) on the query
complexity (in the second case, only for nonadaptive 1-sided
error testers). Our lower bound for functions over {0,1}d is
tight for the case of the {0,1,2} range and constant e.
We also present a local filter of the Lipschitz property for
functions of the form f:{1,...,n}d -> R with lookup complexity
O((log n+1)d). For functions of the form {0,1}d, we show that
every nonadaptive local filter has lookup complexity exponential
in d. The testers that we developed have applications to programs
analysis. The reconstructors have applications to data privacy.
Joint work with Madhav Jha.
Ben Recht (University of Wisconsin)
From Compressed Sensing to Matrix Completion and Beyond
Matrix completion—where one seeks to recover a low rank matrix from
an incomplete set of observations— is a recurring problem in
collaborative filtering, dimensionality reduction, and systems
and control theory. While the general problem of finding the
lowest rank matrix satisfying a set of equality constraints is
NP-hard, a wave of recent results have provided very general
settings under which one can perfectly recover all of the missing
entries of a low-rank matrix by solving convex optimization
problems. In this talk, I will review this emerging body of
work, discussing both theoretical foundations and practical
applications and drawing connections with fast Monte Carlo
algorithms for matrix approximation pioneered by the TCS
community. I will conclude the talk with some recent work on
further extending the catalog of objects and structures that can
be recovered from partial information using convex optimization.
Justin Romberg (Georgia Tech)
Compressive Sensing in Signal Processing
We will overview the fundamental results and recent theoretical
trends in compressive sensing, and review some recent hardware
designs which operate within the CS framework. The theoretical
portion of the talk will include reconstruction guarantees for
structured random matrices (e.g. sampling using random
convolution or random modulation) and recent progress on sampling
ensembles of correlated signals using low-rank recovery
algorithms. The applications will include single-pixel imaging,
radar imaging, acoustic source localization, and accelerated
forward modeling for seismic imaging.
Dana Ron (Tel Aviv University)
Sublinear Algorithms for Approximating Graph Parameters
This talk is a survey of sublinear algorithms for approximating
various graph parameters. In particular, the parameters surveyed are:
(1) The average degree the number of small stars.
(2) The weight of a minimum spanning tree.
(3) The size of a minimum vertex cover and a maximum matching.
(4) The number of edges that should be added/removed in order to
obtain various properties.
Ronitt Rubinfeld (Tel Aviv University and MIT)
Testing Properties of Collections of Distributions
We propose a framework for studying property testing of
collections of distributions, where the number of distributions
in the collection is a parameter of the problem. Previous work on
property testing of distributions considered single distributions
or pairs of distributions. We suggest two models that differ in
the way the algorithm is given access to samples from the
distributions. In one model the algorithm may ask for a sample
from any distribution of its choice, and in the other the choice
of the distribution is random.
Our main focus is on the basic problem of distinguishing between
the case that all the distributions in the collection are the
same (or very similar), and the case that it is necessary to
modify the distributions in the collection in a non-negligible
manner so as to obtain this property. We give almost tight upper
and lower bounds for this testing problem, as well as study an
extension to a clusterability property. One of our lower bounds
directly implies a lower bound on testing independence of a joint
distribution, a result which was left open by previous work.
Joint work with Reut Levi and Dana Ron.
Rocco Servedio (Columbia University)
Learning and Testing k-Modal Distributions
A k-modal probability distribution over the domain {1,…,N} is one
whose histogram has at most k "peaks" and "valleys". Such
distributions are a natural generalization of the well-studied
class of monotone increasing (or monotone decreasing) probability
distributions.
We study the problem of learning an unknown k-modal distribution
from samples. We also study a related problem in property
testing: given access to samples from an unknown k-modal
distribution p, determine whether p is identical to q or p is
"far" from q. Here q is a k-modal distribution which may be given
explicitly or may be available via sample access. These problems
were previously studied in the case k=0 (monotone distributions)
and k=1 (unimodal distributions).
We give algorithms for these problems that are provably close to
the best possible in terms of sample and time complexity. An
interesting feature of our approach is that our learning
algorithms use ingredients from property testing and vice versa.
Joint work with Costis Daskalakis and Ilias Diakonikolas.
C. Seshadhri (Sandia National Laboratories)
Estimating the Longest Increasing Subsequence in Polylogarithmic Time
Finding the length of the longest increasing subsequence (LIS) is
a classic algorithmic problem. A small example should explain the
problem. In the array 1 9 4 10 6 7, the LIS has length 4 and is 1 4 6 7.
Let n denote the size of the input array. Simple textbook
solutions achieve an O(n log n) running time, using dynamic
programming. What can a sublinear time algorithm say about the
LIS? For any constant d > 0, we construct a polylogarithmic time
randomized algorithm that estimates the length of the LIS within
an additive error of dn. Previously, the best known
polylogarithmic time algorithms could only achieve an additive
n/2 approximation.
Why is this problem challenging? The LIS length is the output of
a dynamic program, which means unless we solve many (read linear)
subproblems, we cannot get the exact LIS length. We are looking
to get extremely accurate (read dn) approximations in
polylogarithmic time. The algorithm we construct attempts to
follow the progress of a (top-down) dynamic program. In
polylogarithmic time, it zeroes in on the "important"
sub-problems to solve. In essence, it is a sublinear-time divide
and conquer algorithm. This requires some new sublinear algorithm
ideas, which we hope will have applications for approximating
other dynamic programs.
Joint work with Michael Saks.
Asaf Shapira (Georgia Tech)
Testing Odd-Cycle-Freeness in Boolean Functions
Call a boolean function f Odd-Cycle-Free if for every odd integer
k, and for every k points x1,…,xk in the support of f the sum x1+…+xk
does not vanish. We show that the property of being
odd-cycle-free is testable with poly(1/e) queries. We prove this
by analyzing the eigenvalues of the Cayley graph associated with
a Boolean function.
We also prove that there is a canonical tester for
odd-cycle-freeness making poly(1/e) queries, meaning that the
testing algorithm operates by picking a random linear subspace of
dimension O(log(1/e)) and then checking if the restriction of the
function to the subspace is odd-cycle-free or not. The test is
analyzed by studying the effect of random subspace restriction on
the Fourier coefficients of a function. Our work implies that
using a canonical tester, instead of an arbitrary tester, to test
odd-cycle-freeness results in no more than a polynomial blowup in
the query complexity. The question of whether a canonical tester
with polynomial blowup exists for all linear-invariant properties
remains an open problem.
Joint work with A. Bhattacharyya, E. Grigorescuy and P. Raghavendra.
Christian Sohler (TU Dortmund)
Every Property of Hyperfinite Graphs is Testable
A property testing algorithm for a property P in the bounded
degree graph model [GR97] is an algorithm that, given access to
the adjacency list representation of a graph G=(V,E) with maximum
degree at most d, accepts G with probability at least 2/3 if G
has property P, and rejects G with probability at least 2/3, if
it differs on more than edn edges from every d-degree bounded
graph with property P. A property is testable, if for every e, d,
and n, there is a property testing algorithm Ae,n,d that makes at
most q(e,d) queries to an input graph of n vertices, that is, a
non-uniform algorithm that makes a number of queries that is
independent of the graph size.
A k-disc around a vertex v of a graph G=(V,E) the subgraph
induced by all vertices of distance at most k from v. We show
that the structure of a planar graph on large enough number of
vertices, n, and with constant maximum degree d, is determined,
up to the modification (insertion or deletion) of at most edn
edges, by the frequency of k-discs for certain k=k(e,d), that is
independent of the size of the graph. We can replace planar
graphs by any hyperfinite class of graphs, which includes, for
example, every graph class that does not contain a set of
forbidden minors.
We use this result to obtain new results and improve upon
existing results in the area of property testing. In particular,
we prove that
· graph isomorphism is testable for every class of
hyperfinite graphs,
· every graph property is testable for every class of
hyperfinite graphs,
· every property of hyperfinite graphs is testable in the
bounded degree graph model,
· a large class of graph parameters is approximable for
hyperfinite graphs.
Our results also give a partial explanation of the success of
motifs in the analysis of complex networks.
Joint work with Ilan Newman.
Madhu Sudan (Microsoft Research New England)
Invariance in Property Testing
In this talk we report progress on a line of work that attempts
to abstract a large class of results in testing of ``algebraic
properties'' by their invariance. Specifically, we consider
properties of functions mapping a large (but finite) field K to a
small field F that are invariant under (the roughly K^2) affine
transformations of the domain. In this talk we will give some
motivation for studying this abstraction, as well as some of the
results.
Based on works of/with Eli Ben-Sasson, Elena Grigorescu, Tali
Kaufman, Shachar Lovett, Ghid Maatouk, Amir Shpilka.
Gilad Tsur (Tel Aviv University)
On Approximating the Number of Relevant Variables in a Function
In this work we consider the problem of approximating the number
of relevant variables in a function given query access to the
function. Since obtaining a multiplicative factor approximation
is hard in general, we consider several relaxations of the
problem. In particular, we consider relaxations in which we have
a promise that the function belongs to a certain family of
functions (e.g., linear functions), and we consider a relaxation
of the property testing variant of the problem. In the latter
relaxation the task is to distinguish between the case that the
number of relevant variables is at most k, and the case in which
it is far from any function in which the number of relevant
variable is more than (1+ g)k for a parameter g.
We give both upper bounds and almost matching lower bounds for
the relaxations we study.
This is joint work with Dana Ron.
Paul Valiant (Berkeley)
Estimating the Unseen: Sublinear Statistics
Much of statistics can be described as: given samples from an
unknown distribution, estimate some property of the distribution.
Many natural estimators from statistics require a number of
samples that is linear in a natural parameter of the problem, in
particular, often, a bound on the support size of the
distribution. In this talk I will survey SUBlinear approaches to
these problems, starting with the Good-Turing estimator for the
fraction of unseen probability mass and Fisher's estimate of the
unseen number of species, both from the 1940s, and covering up
through the latest results on estimating entropy, independence,
and closeness of distributions. Both algorithmic results and
lower bounds will be discussed, with a focus on common threads
and unifying intuition.
Roger Wattenhofer (ETH Zürich)
Distributed Algorithms
In my tutorial I will present some of the cornerstone results of
30 years of research in distributed (network) algorithms, using
the maximal independent set (MIS) as a running example. After
explaining the MIS problem, and discussing a naïve deterministic
distributed algorithm to introduce the model, I present a new
proof that the classic randomized distributed algorithm computes
an MIS in O(log n) time, where n is the number of nodes. Then I
sketch an Omega(log*n) lower bound for the list graph. In addition, I
will mention a few other results, e.g. how to compute a MIS in
asymptotically optimal O(log*n) time in bounded growth graphs.
Finally I will present the asymptotically optimal logarithmic
lower bound by Kuhn et al. for minimum vertex cover (MVC), and
mention reductions to other problems including MIS. Throughout
the talk I will mention several open problems, and connections to
other areas.
David Woodruff (IBM Almaden)
Valutazione delle Norme con le Applicazioni
(-)
Ning Xie (MIT)
Local Computation Algorithms
For input x, let F(x) denote the set of outputs that are the “legal”
answers for a computational problem F. Suppose x and members of
F(x) are so large that there is not time to read them in their
entirety. We propose a model of local computation algorithms
which for a given input x, support queries by a user to values of
specified locations yi in a legal output y in F(x). When more
than one legal output y exists for a given x, the local
computation algorithm should output in a way that is consistent
with at least one such y. Local computation algorithms are
intended to distill the common features of several concepts that
have appeared in various algorithmic subfields, including local
distributed computation, local algorithms, locally decodable
codes, and local reconstruction.
We develop a technique, based on known constructions of small
sample spaces of k-wise independent random variables and Beck's
analysis in his algorithmic approach to the Lovasz Local Lemma,
which under certain conditions can be applied to construct local
computation algorithms that run in polylogarithmic time and
space. We apply this technique to maximal independent set
computations, scheduling radio network broadcasts, hypergraph
coloring and satisfying k-SAT formulas.
Based on joint works with Noga Alon, Ronitt Rubinfeld, Gil Tamir
and Shai Vardi.
Yuichi Yoshida (Kyoto University)
Optimal Constant-Time Approximation Algorithms and
(Unconditional) Inapproximability Results for Every Bounded-Degree CSP
Raghavendra (STOC 2008) gave an elegant and surprising result: if
Khot’s Unique Games Conjecture (STOC 2002) is true, then for every
constraint satisfaction problem (CSP), the best approximation
ratio is attained by a certain simple semidefinite programming
and a rounding scheme for it.
In this talk, we show that similar results hold for constant-
time approximation algorithms in the bounded-degree model.
Specifically, we present the following: (i) For every CSP, we
construct an oracle that serves an access, in constant time, to a
nearly optimal solution to a basic LP relaxation of the CSP. (ii)
Using the oracle, we give a constant-time rounding scheme that
achieves an approximation ratio coincident with the integrality
gap of the basic LP. (iii) Finally, we give a generic conversion
from integrality gaps of basic LPs to hardness results. All of
those results are unconditional. Therefore, for every
bounded-degree CSP, we give the best constant-time approximation
algorithm among all.
A CSP instance is called e-far from satisfiability if we must
remove at least an e-fraction of constraints to make it
satisfiable. A CSP is called testable if there is a constant-
time algorithm that distinguishes satisfiable instances from
e-far instances with probability at least 2/3. Using the results
above, we also derive, under a technical assumption, an
equivalent condition under which a CSP is testable in the
bounded-degree model.