Guidelines for a reading group on PROPERTY TESTING
(by Oded Goldreich)
Quick links: Guidelines for
a semester-long reading group
and for a half-semester reading group.
Orientation
This page contains guidelines for
a reading group on property testing based on my
introductory book on the subject.
The following guidelines are quite laconic
(in comparison to those provided for the groups on
introduction to complexity theory
and on foundations of cryptography).
I actually provide two sets of guidelines:
One for a semester-long reading group
and one for a half-semester reading group.
All section numbers refer to sections in the
aforementioned book.
Preface
[The following paragraphs are taken from the book's preface.]
Property testing is concerned with the design of super-fast algorithms
for the ``structural analysis'' of huge amounts of data,
where by structural analysis we mean an analysis aimed
at unveiling global features of the data.
Examples include determining whether the data as a whole
has some property or estimating some global parameter of it.
The focus is on properties and parameters that go beyond
a simple statistics of the type that refers to the frequency
of occurrence of various local patterns.
The algorithms are given direct access to items of a huge data set,
and determine whether this data set has some predetermined (global) property
or is far from having this property. Remarkably, this decision is made
by accessing a small portion of the data set.
In other words, property testing is concerned with the design
of super-fast algorithms for approximate decision making,
where the decision refers to properties or parameters of huge objects.
In particular, we seek algorithms that only inspect relatively small portions
of the huge object. Such algorithms must be randomized
and can only provide approximate answers.
Indeed, two salient aspects of property testing are that
(1) it studies algorithms that can only read parts of the input,
and (2) it focuses on algorithms that solve ``approximate decision'' problems.
Both aspects are quite puzzling: What can one do without
even reading the input? What does approximate decision mean?
The answer is that these two aspects are indeed linked:
Approximate decision means distinguishing objects that
have some predetermined property (i.e., reside in some predetermined set)
from objects that are ``far'' from having the property
(i.e., are far from any object having the property),
where the notion of distance employed here is the relative number
of different symbols in the descriptions of the objects.
Such approximate decisions may be valuable in settings in which
an exact decision is infeasible or very expensive
or just considerably more expensive than obtaining an approximate decision.
The point is that, in many cases, approximate decision can be achieved
by super-fast randomized algorithms. One well-known example is
the common practice of estimating various statistics by sampling,
which can be cast as a a small collection of approximate decision problems
(with respect to some threshold values).
Research in property testing aims to extend this useful practice
to properties that cannot be cast as statistics of values that
are associated with a large population.
Examples in which this goal was achieved include testing properties
of functions such as being a low degree polynomial, being monotone,
depending on a specified number of attributes, testing properties of graphs
such as being bipartite and being triangle-free, and testing properties
of geometric objects and visual images such as being well-clustered
and being a convex body.
Viewing the input as a function is natural in the context of algorithms that
do not read their entire input. Such algorithms must probe the input
at locations of their choice, and such probing can be thought of as querying
a function that represents the input.
The key point here is that the number of probes (or queries) is smaller
than the size of the input, and so decisions are taken after seeing
only a small part of the input. However, the inspected positions
are not fixed but are rather chosen at random by the algorithm,
possibly based on answers obtained to prior queries.
Thus, in general, these algorithms may ``explore'' the input,
rather than merely obtain its value at a uniformly selected sample
of locations. Such exploration is most appealing when the tested input
is a graph, which may be represented by a function
(e.g., by its adjacency predicate),
but the notion of exploration applies also in other cases.
Research in property testing may be both algorithmic and complexity theoretic.
This is reflected both in its goals, which may be the design of better
algorithms or the presentation of lower bound on their complexity,
and its tools and techniques. Such research is related to several areas
of computer science and mathematics including Combinatorics, Statistics,
Computational Learning Theory, Computational Geometry, and Coding theory.
Historically, property testing was closely associated with the study
of Probabilistically Checkable Proofs (PCPs), and some connections
do exist between the two, but property testing is not confined to PCPs
(and/or to the study of ``locally testable codes'').
THE CURRENT BOOK aims to provide an introduction to Property Testing,
by presenting some of the main themes, results, and techniques that
characterize the area and are used in it. As usual in such cases,
the choice reflects a judgement of what is most adequate for presentation
in the context of such an introductory course, and this selection does not
reflect lack of appreciation of the omitted material but rather an opinion
that it fit less well the intended purpose of the text.
Additional choices regarding the organization of the material
and the amount of inter-dependencies among its parts
are discussed in the rest of the original preface
(see HERE).
In general, only Chap 1 (or rather Sec 1.2-1.3) is essential to
any of the subsequent chapters, with the exception of few
dependencies depicted in Figure 1 (on p. xvi) of the book.
A comment about the actual contents of property testing
One characteristic of property testing is
that its positive results tend to be established
by simple algorithms that are backed by a complex analysis.
The simplicity of these algorithms,
which sometimes appear as starightfoward,
make some reader expect a simple analysis
and get frustrated by the text that frustrtaes their expectations.
This is a fundamental mistake.
The fact that the analysis is complex and sometimes does not
even establish the expected bound (see, e.g., Sections 2.3 and 8.4.2)
means that the initial expectation was plkainly wrong.
Guidelines for a semester-long reading group (i.e., 12 meetings)
These guidelines provide a rather paced and comprehensive
introduction to property testing.
The suggest reading much of the material in the first 11 chapters
of the introductory book,
while leaving some additional parts of optional reading
and skipping other parts.
In addition, Chapters 12 and 13 are also left for optional reading.
Both types of optional readings are beyond what is recommended
and calculated as fitting the format of 12 meetings (of two hours each).
A general comment: When reading the text, do not rush through
conceptual discussions but rather reflect on them.
The same holds for high-level overviews of proofs.
When failing to understand some part of a proof,
make sure that you understand the high-level picture
(i.e., the notions that underlie the claim being proved,
the high-level strategy of the proof, etc).
Do not get stuck with low-level details,
unless you are sure that they are essential to your understanding
of the main idea(s) of the proof.
The possibility of errors in the text should not be dismissed;
see list of corrections and additions.
Meeting 1: Introduction, motivation, and mind-frame (Sec 1.1-1.2)
Sections 1.1-1.2 provide a high-level introduction to the area.
They contain few technical details, and a lot of conceptual discussions.
Do NOT make the mistake of rushing through conceptual text,
but rather read it slowly and reflect on it.
As will become clear already in the proof of Prop 1.1,
this area requires familiarity and comfort with basic
elements of probability theory (reviewed in Apdx A).
If you had any trouble with this proof, then start
by verifying that you are in full command of Apdx A.4.
Meeting 2: Main notions and general results (1.3.1, 1.3.3, 1.3.5)
Sec. 1.3.1 presents the main definition used throughout the book (Def 1.6),
and discusses the focus on query complexity and various related issues.
Sec. 1.3.3 outlines an important special case (see Def 1.7 and Thm 1.9).
Sec 1.3.5 discusses the relation between property testing and learning.
(Sec. 1.3.2, which outlines a few ramification, and Sec. 1.3.4,
which discusses the "algebra of property testing" are optional.)
Meeting 3: Testing Linearity (Chap 2)
Chap 2 presets the most famous example of a property tester:
One that tests whether an input function represents a homomorphism
between two (known) groups.
Meeting 4: Low Degree Tests (Sec 3.1 and 3.4)
Chap 3 presets another famous example of a property tester:
One that, for given $m,F$ and $d$,
tests whether an $m$-variant input function represents
a $m$-variant polynomial of (total) degree at most $d$ over $F$.
Read Sec 3.1 and 3.4, while skipping the background in Sec 3.2-3.3
(but reading only Cor 3.3, which is used in Sec 3.4).
(Optional: Read the background in Sec 3.2-3.3;
actually, also Sec 3.4.2 is optional.)
Meeting 5: Testing monotonicity (4.2.1 and 4.3.1)
A multi-variant function is called monotone
if it is monotone on each of its variables
(i.e., increasing the value of a variable
cannot decrease the value of the function).
Read Sec 4.2.1 (i.e., the "edge test")
and Sec 4.3.1 (i.e., the "line test"),
which consider two special cases
(i.e., Boolean functions define on the Boolean cube
and univariate functions).
(Optional: An overview of the generalization
that refers to functions defined over a hyper-grid
is presented in Sec 4.4.0.)
Meeting 6: Testing Dictatorship and Juntas (5.2.1, 5.2.3, and 5.3)
A Boolean multi-variate function is call a $k$-junta
if it depends on at most $k$ of its variables.
The case of $k=0$ correspond to constant functions,
whereas the case of $k=1$ corresponds to either a constant
or a dictatorship or an anti-dictatorship
(depending on how the value depends on the single variable).
Read Sec 5.2, which deals with testing dictatorships,
while skipping Sec 5.2.2 (i.e., testing monomial).
Sec 5.2.1 presents and analyzes a dictatorship test,
whereas Sec 5.2.3 highlights the technique of self-correction,
which is used in the analysis.
Then, read Sec 5.3 that refers to testing Juntas.
Note that this reading assignment is larger than the previous ones,
so don't even consider reading the tedious and non-insightful
proofs of Facts 1-2 (p. 107) and Fact 3 (p. 110).
Meeting 7: Testing by Implicit Sampling (Sec 6.2)
This interesting section relies on junta testing
(i.e., the prior meeting). Read Sec 6.2.
Meeting 8: Lower Bounds Techniques (7.2 and 7.3)
My own opinion is that, in the context of property testing,
lower bounds are merely a justification for failure to obtain
better results in specific cases. That is, the very fact that
things are hard (i.e., do not allow for sub-linear query complexity)
is the norm, and the interesting cases are the exceptional cases
in which the norm does not hold.
Read Section 7.2, which presents the most popular technique
(i.e., indistinguishability of distributions of objects),
and Section 7.3., which presents an interesting (and surprising)
connection to communication complexity.
No prior familiarity with communication complexity
is required; whatever we need is explained in Sec 7.3.1.
Meeting 9: Testing Graph Properties in the Dense Graph Model (8.1-8.3)
Read Section 8.1, which provides brief overview to the three models
that will be studied in the next three meeting, where the current
meeting is devoted to the first model (known as the "dense graph model").
The actual reading assignment consists of Sections 8.2-8.3:
Sec 8.2 presents this model as well as basic facts about it,
and Sec 8.3 presents testers for a natural class of properties
(which includes $k$-coloring, for any constant $k$, and "max-CUT").
Note that Sec 8.2.4 provides a digest of a method that
was used in Sec 8.2.3 (see Alg 8.5.2 and its analysis).
(Optional: Sec 8.4 presents a celebrated connection
between testing graph properties in the dense graph model
and Szemeredi's Regularity Lemma, which is reviewed in Sec 8.4.1.
No prior familiarity with the Regularity lemma is required.)
Meeting 10: Testing Graph Properties in the Bounded-Degree Graph Model
(9.1, 9.2.3, 9.2.4, 9.3.1, and 9.4.1)
The current meeting is devoted to the second model
(known as the "bounded-degree graph model")
that was briefly reviewed in Sec 8.1.
The actual reading assignment consists of
Section 9.1, which presents the model,
Sections 9.2.3 and 9.2.4,
which deal with testing connectivity and $t$-connectivity,
and Sections 9.3.1 and 9.4.1,
which deal with testing Bipartiteness (i.e., 2-colorability).
Note that Sections 9.4.1, 9.3.1 and 9.4.1 provide only
overviews and/or partial results.
(Optional: Sec 9.5 presents the notion of
a partition oracle and its usage for testing minor-free
graph properties (in the bounded-degree graph model).)
Meeting 11: Testing Graph Properties in the General Graph Model
(10.1, 10.2.2, and 10.3.2.1)
The current meeting is devoted to the third model
(known as the "general graph model")
that was briefly reviewed in Sec 8.1.
Arguably, this is the most interesting model,
alas relatively little is known about it.
The actual reading assignment consists of
Section 10.1, which presents the model,
and Sections 10.2.2 and 10.3.2.1,
which presents a few useful ideas.
Meeting 12: Testing Properties of Distributions (11.1 and 11.2)
This meeting is devoted to a model that is fundamentally different from
the models considered in all prior meetings.
The object we test is a distribution (over $n$ items)
rather than a function (over a set of $n$ elements),
the property is a property of such distributions,
and the tester obtains samples of the distribution
(rather than making queries to an oracle).
The model is presented in Section 11.1,
and the natural problem of testing equality to a known distribution
is studied in Sec 11.2.
Guidelines for a half-semester reading group (i.e., 5 meetings)
These guidelines provide a much faster and more partial
introduction to property testing (than the one provided by
the semester-long reading group).
Meeting 1: The mind-frame and main notions (Sec 1.2, 1.3.1 and 1.3.3)
Section 1.2 outlines the mind-frame of the areas
and Sections 1.3.1 and 1.3.3 present the main notions.
They contain a lot of conceptual discussions.
Do NOT make the mistake of rushing through conceptual text,
but rather read it slowly and reflect on it.
As will become clear already in the proof of Prop 1.1,
this area requires familiarity and comfort with basic
elements of probability theory (reviewed in Apdx A).
If you had any trouble with this proof, then start
by verifying that you are in full command of Apdx A.4.
Meeting 2: Testing Linearity (Chap 2)
Chap 2 presets the most famous example of a property tester:
One that tests whether an input function represents a homomorphism
between two (known) groups.
Meeting 3: Testing Dictatorship (5.2.1 and 5.2.3)
A multi-variant Boolean function is called a dictatorship
if its value equals the value of one of its variables.
Read Sec 5.2, which deals with testing dictatorships,
while skipping Sec 5.2.2 (i.e., testing monomial).
Meeting 4: Testing Graph Properties in the Dense Graph Model
(8.2 and 8.3)
This meeting is devoted to the "dense graph model".
Sec 8.2 presents this model as well as basic facts about it,
and Sec 8.3 presents testers for a natural class of properties
(which includes $k$-coloring, for any constant $k$, and "max-CUT").
Note that Sec 8.2.4 provides a digest of a method that
was used in Sec 8.2.3 (see Alg 8.5.2 and its analysis).
(Optional: Section 8.1 which provides a wider perspective;
that is, a brief overview to the three models of testing
graph properties, where only the first two models will be
studied in the current and next meeting.)
Meeting 5: Testing Graph Properties in the Bounded-Degree Graph Model
(9.1, 9.2.3, 9.2.4, 9.3.1, and 9.4.1)
This meeting is devoted to the "bounded-degree graph model".
Section 9.1 presents the model,
Sections 9.2.3 and 9.2.4
deal with testing connectivity and $t$-connectivity,
and Sections 9.3.1 and 9.4.1
deal with testing Bipartiteness (i.e., 2-colorability).
Note that Sections 9.4.1, 9.3.1 and 9.4.1 provide only
overviews and/or partial results.
Back to the two-volume book's webpage.