Program of the ITCS mini-workshop on Property Testing (2010)
Organizers: Oded Goldreich and Amy Yuexuan Wang.
Dates: January 8-10, 2010.
Official webpage:
click here.
Property Testing is the study of super-fast (randomized) algorithms
for approximate decision making. These algorithms are given
direct access to items of a huge data set, and determine
whether this data set has some predetermined (global) property
or is far from having this property. Remarkably, this approximate
decision is made by accessing a small portion of the data set.
Property Testing has been a subject of intensive research
in the last couple of decades, with hundreds of studies
conducted in it and in closely related areas.
Indeed, Property Testing is closely related to Probabilistically
Checkable Proofs (PCPs), and is related to Coding Theory,
Combinatorics, Statistics, Computational Learning Theory,
Computational Geometry, and more.
Program (annotated with brief comments by Oded Goldreich)
Abstracts of all talks are available in the
official
brochure of the mini-workshop.
(In addition,
extended abstracts of most papers are available below.)
SESSION 1 (FRIDAY, Jan 8, 9:00-10:15) [chair: Oded Goldreich]
Kevin was ill, and so Rocco gave both talks.
I am fascinated by the implicit learning paradigm,
which is pivoted on emulating learning algorithms
for $k$-variable functions by using $n$-bit long samples
with $k$ influential variables. The emulation proceeds by
randomly partitioning the variables to $O(k^2)$ sets,
identifying $k$ sets that are likely to each contain
a single influential variable, and converting the $n$-bit
long samples to corresponding $k$-bit samples. Specifically,
each $n$-bit sample is convereted by determining for
each influential set whether the subset of variables labeled 1
or the subset labeled 0 is influential, and setting the corresponding
bit accordingly. Once the learning algorithm outputs a hypothesis
for the $k$-bit function, it is tested using adequate queries
(which are again emulated correspondingly).
SESSION 2 (FRIDAY, Jan 8, 10:45-12:00) [chair: Avrim Blum]
See a review of Eric's work on testing juntas
in my choices (Nr. 19).
In addition, Eric advocated a study of the following class
of parameterized problems. For a fixed function $g$, which
is known to the tester, the task is testing whether the
given oracle/function is isomorphic to $g$ (i.e., $f$
equals $g$ under a relabeling of the indices of the variables).
Michael presented an overview of results that assert the existence
of properties with arbitrary (reasonable) query complexity bound.
SESSION 3 (FRIDAY, Jan 8, 14:00-15:30) [chair: Bernard Chazelle]
Ilan surveyed works that refer to "massively parameterized"
testing problems. The parameters are huge structures (e.g., graphs)
and the problem refers to a substructure of it (e.g., an assignment
to the graph's edges). In the case of graphs, the assignment may be viewed
as an orientation of the fixed graph or a subgraph of it. Sofya
discussed the notion of transitive-closure spanners
(i.e., spanners of the transitive closure graph)
that are implicit in some monotonicity testers.
PANEL 1 (FRIDAY, Jan 8, 16:00-17:00)
On the connection of Property Testing to Computational Learning Theory
and Computational Geometry.
Panelists: Avrim Blum, Bernard Chazelle, and Rocco Servedio.
Bernard advocated an attempt to provide a sublinear time
analysis of dynamic systems, which may consist of selecting few
objects and tracing their activity during time. Avrim emphasised
the high cost of queries (rather than random examples) in typical
learning applications, and I suggested to consider a two-parameter
complexity measure that separates the number of random examples from
the number of actual queries.
Rocco suggested to try to relate property testing to agnostic learning
(rather than to standard learning), and highlighted the open problem
of tolerant testing of half planes.
SESSION 4 (SATURDAY, Jan 9, 9:00-10:15) [chair: Shafi Goldwasser]
Madhu mused on what makes problems easily testable
and suggested that closure under a rich permutation group plays a
major role. This seems supperted by the numerous results regarding
algebraic functions (which are all invariant under suitable affine
transformations) and graph properties (which, by definition, are closed
under isomorphism). He surveyed a unified approach that refers to the
former, where the affine transformations refer to the identification
of the domain with a vector space. Victor demonstrated that the
approach extends also to non-linear properties, and Noga commented
that an alternative demonstration may be provided by a property
that is a union of two disjoint linear spaces.
SESSION 5 (SATURDAY, Jan 9, 10:45-12:00) [chair: Michael Saks]
Asaf's study refers to the paramertized problem
of whether a given subset of $[n]$ is free from containing
any solution to a fixed set of linear equations.
This was studied before with respect to a single equation
(i.e., $x+y=z$), and Asaf's work treats any linear system.
Noga's study refers to approximating quantities such as
the independence number of bounded-degree graphs.
He shows that constant-time algorithms can almost match the
best approximation bounds that are previously known for PTAS,
and that better bounds cannot be achieved in constant-time.
(This is related to Krzysztof's presentation in Session 6,
which follows prior work of Dana Ron et al.)
Ronitt's work demonstrates that techniques employed
in the context of distributed algorithms can be applied
to the context of dynamic graph algorithms.
This augments prior connections between constant-round distributed algorithms
and constant-time approximation algorithms discovered by Parnas and Ron.
Ronitt asked whether a direct relation can be shown between
dynamic graph algorithms and constant-time approximation algorithms.
Krzysztof advocated applying external memory cost measures
to property testing algorithms. He showed such a result for
the problem of element distinctness, presenting an algorithm
that samples $\sqrt(nB/\eps)$ random $B$-bit blocks.
SESSION 6 (SATURDAY, Jan 9, 14:00-15:30) [chair: Noga Alon]
- Krzysztof Onak: Sublinear Graph Approximation Algorithms
- Michael Saks:
Local
Monotonicity Reconstruction
- Ronitt Rubinfeld: Testing properties of distributions: a survey
See a review of Krzysztof's work
in my choices (Nr. 8).
Michael discussed the local reconstruction problem
applied to objects for which there are exponentially
many adequate solutions. The problem in this case
is to determine some fixed solution as a function
of the corrupted oracle and the random bits (used in the reconstruction),
where in some applications it is desireable to use few random bits.
Ronitt surveyed the study of testing properties of distributions,
starting with the problem of testing identity to some known
distribution (i.e., a parametreized problem), which is
solvable by $sqrt(n)$ samples when $n$ is the domain's size
(and the estimate is via collision probabilities).
Testing that two given distributions
(i.e., both distributions are "given" by samples)
requires 3-way collisions and so $n^{2/3}$ samples.
In contrast, approximating the distance to the uniform
distribution requires an "almost linear" number of samples.
PANEL 2 (SATERDAY, Jan 9, 16:00-17:00)
On the connection of Property Testing to Coding Theory,
Combinatorics, and Statistics.
Panelists: Madhu Sudan, Noga Alon, and Ronitt Rubinfeld.
Noga focused on the "global vs local" nature of property testing that
is closely related to a main theme in combinatorics initiated by Erdos
in the 1950's. He noted that a tester of $k$-colorability is implicit
in work of the 1980's, but the query bound obtained there is a tower
of exponents in the proximity parameter. Answering a question,
Noga kind of suggested the challenge of testing triangle-freeness
in $2^{O(1/\eps)^2}$ queries. (Recall that the currently best known
bound is a tower of exponents in $poly(1/\eps)$.)
Ronitt reported on the awakening of interest of statisticians
in the effect of the size of the domain (of values).
(Traditionally, Statistics is concerned with the effect of
the size of the universe (sample space), whereas we are concerned
with the effect of the number of values obtained in this space.)
Still there are significant cultural differences, since in Statistics
one often makes (implicit) assumptions about the distribution
and/or assumes that the distribution itself is selected at random
among some possibilities (hence "likelihood" is relevant).
Ronitt also pointed out possible applications to areas
that use Statistics such as natural language processing
and data bases.
Madhu pointed out that sublinear time algorithms arise
naturally in the context of coding theory.
Shafi advocated the study of property testing of algebraic (and number
theoretic) problems beyond linearity and low-degree properties.
A concrete example may be testing that a polynomial
(given by its evaluations) is an irreducible polynomial.
Can a tester be significantly more efficient than polynomial interpolation?
SESSION 7 (SUNDAY, Jan 10, 9:00-10:15) [chair: Ronitt Rubinfeld]
See a review of Prahladh's work
in my choices (Nr. 13).
Victor surveyed the state of the art regarding the amortized
query complexity of dictatorship testing (with perfect completeness),
which currently stands on soundness error of $O(q\cdot2^{-q})$
per $q$ non-adaptive queries
(see ECCC TR09-074).
Michael discussed a general framework for testing graph properties,
where distances are normalized by the actual number of edges
and various types of queries are considered. In the work,
"group queries" (akin of "group testing") are shown to
be strictly stronger than the combination of the standard vertex-pair
and neighbor queries.
SESSION 8 (SUNDAY, Jan 10, 10:45-12:00) [chair: Michael Krivelevich]
Shubhangi advocated viewing locally testable linear codes as
a special case of tolerant testing the Hadamard codewords
under some adequate distributions (specifically, the distribution
that is uniform on the corresponding positions of the Hadamard code).
The tolerance condition prevents rejecting a codeword based on
positions that are assigned zero probability. She presented two
results, relating to two notion of tolerant testing (or rather
distance approximaton): the more general result yields a weak
notion of approximation and is obtained under the so-called
uniformly correlated condition, whereas a strong notion of
approximation requires a stronger correlated condition.
A distribution $\mu$ is called $k$-uniformly correlated
if there is a joint distribution of $k$-tuples such that each
element (in the $k$-tuple) is distributed as $\mu$
but their sum is uniformly distributed.
The stronger condition requires that this holds when these $k$
elements are independently drawn from $\mu$, which is equivalent to
requiring that the code be sparse and have small Fourier coefficients.
Swastik presented an improved analysis of the AKKLR linearity test
(for degree $d$ multilinear polynomails over GF(2)),
showing that the basic test rejects far away function with
constant probability (rather than with probability $Omega(2^{-d})$).
SESSION 9 (SUNDAY, Jan 10, 14:00-15:30) [chair: Madhu Sudan]
Eli surveyed several results that assert that various parameters of LTCs
imply an exponentially vabishing rate. One such result refers to LTCs
that are tested by a small set of possible constraints
(which is only somewhat larger than the dimention of the dual code).
Arthur advocated the study of testing continuous distributions,
focusing on the case that they are actually discrete
(i.e., assume a finite number of possible values).
Krzysztof contrasted the query complexity of testing the
edit distance of a given string (from a fixed string)
to the corresponding problem for Ulam distance.
Sofya presented an application of reconstruction procedures
(as presented by Michael Saks in Session~2) to data privacy;
specifically, if data privacy is maintained for functions of a certain type,
then we may want to modify the given function to that class.
I tried to call attention to a
work with Dana Ron,
highlighting several perspectives and questions that arise.
PANEL 3 (SUNDAY, Jan 10, 16:00-17:00)
- Prahladh Harsha on decoding in the low-error regime
- Ron Rivest on possible applications of property testing
to the security evaluation of hashing functions.
Prahladh mused on whether decoding in the low-error regime
may find additional applications in property testing.
Ron asked whether property testing techniques can be
emplyed to the evaluation of the quality of various
compression functions.
Additional material
See the
official
brochure (in PDF) and
Oded Goldreich's homepage.