Guidelines for a reading group on PROPERTY TESTING

(by Oded Goldreich)


Quick links: Guidelines for a semester-long reading group and for a half-semester reading group.

Orientation

This page contains guidelines for a reading group on property testing based on my introductory book on the subject. The following guidelines are quite laconic (in comparison to those provided for the groups on introduction to complexity theory and on foundations of cryptography).

I actually provide two sets of guidelines: One for a semester-long reading group and one for a half-semester reading group. All section numbers refer to sections in the aforementioned book.

Preface

[The following paragraphs are taken from the book's preface.]

Property testing is concerned with the design of super-fast algorithms for the ``structural analysis'' of huge amounts of data, where by structural analysis we mean an analysis aimed at unveiling global features of the data. Examples include determining whether the data as a whole has some property or estimating some global parameter of it. The focus is on properties and parameters that go beyond a simple statistics of the type that refers to the frequency of occurrence of various local patterns. The algorithms are given direct access to items of a huge data set, and determine whether this data set has some predetermined (global) property or is far from having this property. Remarkably, this decision is made by accessing a small portion of the data set.

In other words, property testing is concerned with the design of super-fast algorithms for approximate decision making, where the decision refers to properties or parameters of huge objects. In particular, we seek algorithms that only inspect relatively small portions of the huge object. Such algorithms must be randomized and can only provide approximate answers. Indeed, two salient aspects of property testing are that (1) it studies algorithms that can only read parts of the input, and (2) it focuses on algorithms that solve ``approximate decision'' problems. Both aspects are quite puzzling: What can one do without even reading the input? What does approximate decision mean?

The answer is that these two aspects are indeed linked: Approximate decision means distinguishing objects that have some predetermined property (i.e., reside in some predetermined set) from objects that are ``far'' from having the property (i.e., are far from any object having the property), where the notion of distance employed here is the relative number of different symbols in the descriptions of the objects. Such approximate decisions may be valuable in settings in which an exact decision is infeasible or very expensive or just considerably more expensive than obtaining an approximate decision.

The point is that, in many cases, approximate decision can be achieved by super-fast randomized algorithms. One well-known example is the common practice of estimating various statistics by sampling, which can be cast as a a small collection of approximate decision problems (with respect to some threshold values). Research in property testing aims to extend this useful practice to properties that cannot be cast as statistics of values that are associated with a large population. Examples in which this goal was achieved include testing properties of functions such as being a low degree polynomial, being monotone, depending on a specified number of attributes, testing properties of graphs such as being bipartite and being triangle-free, and testing properties of geometric objects and visual images such as being well-clustered and being a convex body.

Viewing the input as a function is natural in the context of algorithms that do not read their entire input. Such algorithms must probe the input at locations of their choice, and such probing can be thought of as querying a function that represents the input. The key point here is that the number of probes (or queries) is smaller than the size of the input, and so decisions are taken after seeing only a small part of the input. However, the inspected positions are not fixed but are rather chosen at random by the algorithm, possibly based on answers obtained to prior queries. Thus, in general, these algorithms may ``explore'' the input, rather than merely obtain its value at a uniformly selected sample of locations. Such exploration is most appealing when the tested input is a graph, which may be represented by a function (e.g., by its adjacency predicate), but the notion of exploration applies also in other cases.

Research in property testing may be both algorithmic and complexity theoretic. This is reflected both in its goals, which may be the design of better algorithms or the presentation of lower bound on their complexity, and its tools and techniques. Such research is related to several areas of computer science and mathematics including Combinatorics, Statistics, Computational Learning Theory, Computational Geometry, and Coding theory. Historically, property testing was closely associated with the study of Probabilistically Checkable Proofs (PCPs), and some connections do exist between the two, but property testing is not confined to PCPs (and/or to the study of ``locally testable codes'').

THE CURRENT BOOK aims to provide an introduction to Property Testing, by presenting some of the main themes, results, and techniques that characterize the area and are used in it. As usual in such cases, the choice reflects a judgement of what is most adequate for presentation in the context of such an introductory course, and this selection does not reflect lack of appreciation of the omitted material but rather an opinion that it fit less well the intended purpose of the text. Additional choices regarding the organization of the material and the amount of inter-dependencies among its parts are discussed in the rest of the original preface (see HERE). In general, only Chap 1 (or rather Sec 1.2-1.3) is essential to any of the subsequent chapters, with the exception of few dependencies depicted in Figure 1 (on p. xvi) of the book.

A comment about the actual contents of property testing

One characteristic of property testing is that its positive results tend to be established by simple algorithms that are backed by a complex analysis. The simplicity of these algorithms, which sometimes appear as starightfoward, make some reader expect a simple analysis and get frustrated by the text that frustrtaes their expectations. This is a fundamental mistake. The fact that the analysis is complex and sometimes does not even establish the expected bound (see, e.g., Sections 2.3 and 8.4.2) means that the initial expectation was plkainly wrong.

Guidelines for a semester-long reading group (i.e., 12 meetings)

These guidelines provide a rather paced and comprehensive introduction to property testing. The suggest reading much of the material in the first 11 chapters of the introductory book, while leaving some additional parts of optional reading and skipping other parts. In addition, Chapters 12 and 13 are also left for optional reading. Both types of optional readings are beyond what is recommended and calculated as fitting the format of 12 meetings (of two hours each).

A general comment: When reading the text, do not rush through conceptual discussions but rather reflect on them. The same holds for high-level overviews of proofs. When failing to understand some part of a proof, make sure that you understand the high-level picture (i.e., the notions that underlie the claim being proved, the high-level strategy of the proof, etc). Do not get stuck with low-level details, unless you are sure that they are essential to your understanding of the main idea(s) of the proof. The possibility of errors in the text should not be dismissed; see list of corrections and additions.

Meeting 1: Introduction, motivation, and mind-frame (Sec 1.1-1.2)

Sections 1.1-1.2 provide a high-level introduction to the area. They contain few technical details, and a lot of conceptual discussions. Do NOT make the mistake of rushing through conceptual text, but rather read it slowly and reflect on it. As will become clear already in the proof of Prop 1.1, this area requires familiarity and comfort with basic elements of probability theory (reviewed in Apdx A). If you had any trouble with this proof, then start by verifying that you are in full command of Apdx A.4.

Meeting 2: Main notions and general results (1.3.1, 1.3.3, 1.3.5)

Sec. 1.3.1 presents the main definition used throughout the book (Def 1.6), and discusses the focus on query complexity and various related issues. Sec. 1.3.3 outlines an important special case (see Def 1.7 and Thm 1.9). Sec 1.3.5 discusses the relation between property testing and learning.

(Sec. 1.3.2, which outlines a few ramification, and Sec. 1.3.4, which discusses the "algebra of property testing" are optional.)

Meeting 3: Testing Linearity (Chap 2)

Chap 2 presets the most famous example of a property tester: One that tests whether an input function represents a homomorphism between two (known) groups.

Meeting 4: Low Degree Tests (Sec 3.1 and 3.4)

Chap 3 presets another famous example of a property tester: One that, for given $m,F$ and $d$, tests whether an $m$-variant input function represents a $m$-variant polynomial of (total) degree at most $d$ over $F$. Read Sec 3.1 and 3.4, while skipping the background in Sec 3.2-3.3 (but reading only Cor 3.3, which is used in Sec 3.4).

(Optional: Read the background in Sec 3.2-3.3; actually, also Sec 3.4.2 is optional.)

Meeting 5: Testing monotonicity (4.2.1 and 4.3.1)

A multi-variant function is called monotone if it is monotone on each of its variables (i.e., increasing the value of a variable cannot decrease the value of the function). Read Sec 4.2.1 (i.e., the "edge test") and Sec 4.3.1 (i.e., the "line test"), which consider two special cases (i.e., Boolean functions define on the Boolean cube and univariate functions).

(Optional: An overview of the generalization that refers to functions defined over a hyper-grid is presented in Sec 4.4.0.)

Meeting 6: Testing Dictatorship and Juntas (5.2.1, 5.2.3, and 5.3)

A Boolean multi-variate function is call a $k$-junta if it depends on at most $k$ of its variables. The case of $k=0$ correspond to constant functions, whereas the case of $k=1$ corresponds to either a constant or a dictatorship or an anti-dictatorship (depending on how the value depends on the single variable).

Read Sec 5.2, which deals with testing dictatorships, while skipping Sec 5.2.2 (i.e., testing monomial). Sec 5.2.1 presents and analyzes a dictatorship test, whereas Sec 5.2.3 highlights the technique of self-correction, which is used in the analysis. Then, read Sec 5.3 that refers to testing Juntas.

Note that this reading assignment is larger than the previous ones, so don't even consider reading the tedious and non-insightful proofs of Facts 1-2 (p. 107) and Fact 3 (p. 110).

Meeting 7: Testing by Implicit Sampling (Sec 6.2)

This interesting section relies on junta testing (i.e., the prior meeting). Read Sec 6.2.

Meeting 8: Lower Bounds Techniques (7.2 and 7.3)

My own opinion is that, in the context of property testing, lower bounds are merely a justification for failure to obtain better results in specific cases. That is, the very fact that things are hard (i.e., do not allow for sub-linear query complexity) is the norm, and the interesting cases are the exceptional cases in which the norm does not hold.

Read Section 7.2, which presents the most popular technique (i.e., indistinguishability of distributions of objects), and Section 7.3., which presents an interesting (and surprising) connection to communication complexity. No prior familiarity with communication complexity is required; whatever we need is explained in Sec 7.3.1.

Meeting 9: Testing Graph Properties in the Dense Graph Model (8.1-8.3)

Read Section 8.1, which provides brief overview to the three models that will be studied in the next three meeting, where the current meeting is devoted to the first model (known as the "dense graph model"). The actual reading assignment consists of Sections 8.2-8.3: Sec 8.2 presents this model as well as basic facts about it, and Sec 8.3 presents testers for a natural class of properties (which includes $k$-coloring, for any constant $k$, and "max-CUT"). Note that Sec 8.2.4 provides a digest of a method that was used in Sec 8.2.3 (see Alg 8.5.2 and its analysis).

(Optional: Sec 8.4 presents a celebrated connection between testing graph properties in the dense graph model and Szemeredi's Regularity Lemma, which is reviewed in Sec 8.4.1. No prior familiarity with the Regularity lemma is required.)

Meeting 10: Testing Graph Properties in the Bounded-Degree Graph Model (9.1, 9.2.3, 9.2.4, 9.3.1, and 9.4.1)

The current meeting is devoted to the second model (known as the "bounded-degree graph model") that was briefly reviewed in Sec 8.1. The actual reading assignment consists of Section 9.1, which presents the model, Sections 9.2.3 and 9.2.4, which deal with testing connectivity and $t$-connectivity, and Sections 9.3.1 and 9.4.1, which deal with testing Bipartiteness (i.e., 2-colorability). Note that Sections 9.4.1, 9.3.1 and 9.4.1 provide only overviews and/or partial results.

(Optional: Sec 9.5 presents the notion of a partition oracle and its usage for testing minor-free graph properties (in the bounded-degree graph model).)

Meeting 11: Testing Graph Properties in the General Graph Model (10.1, 10.2.2, and 10.3.2.1)

The current meeting is devoted to the third model (known as the "general graph model") that was briefly reviewed in Sec 8.1. Arguably, this is the most interesting model, alas relatively little is known about it. The actual reading assignment consists of Section 10.1, which presents the model, and Sections 10.2.2 and 10.3.2.1, which presents a few useful ideas.

Meeting 12: Testing Properties of Distributions (11.1 and 11.2)

This meeting is devoted to a model that is fundamentally different from the models considered in all prior meetings. The object we test is a distribution (over $n$ items) rather than a function (over a set of $n$ elements), the property is a property of such distributions, and the tester obtains samples of the distribution (rather than making queries to an oracle). The model is presented in Section 11.1, and the natural problem of testing equality to a known distribution is studied in Sec 11.2.

Guidelines for a half-semester reading group (i.e., 5 meetings)

These guidelines provide a much faster and more partial introduction to property testing (than the one provided by the semester-long reading group).

Meeting 1: The mind-frame and main notions (Sec 1.2, 1.3.1 and 1.3.3)

Section 1.2 outlines the mind-frame of the areas and Sections 1.3.1 and 1.3.3 present the main notions. They contain a lot of conceptual discussions. Do NOT make the mistake of rushing through conceptual text, but rather read it slowly and reflect on it. As will become clear already in the proof of Prop 1.1, this area requires familiarity and comfort with basic elements of probability theory (reviewed in Apdx A). If you had any trouble with this proof, then start by verifying that you are in full command of Apdx A.4.

Meeting 2: Testing Linearity (Chap 2)

Chap 2 presets the most famous example of a property tester: One that tests whether an input function represents a homomorphism between two (known) groups.

Meeting 3: Testing Dictatorship (5.2.1 and 5.2.3)

A multi-variant Boolean function is called a dictatorship if its value equals the value of one of its variables. Read Sec 5.2, which deals with testing dictatorships, while skipping Sec 5.2.2 (i.e., testing monomial).

Meeting 4: Testing Graph Properties in the Dense Graph Model (8.2 and 8.3)

This meeting is devoted to the "dense graph model". Sec 8.2 presents this model as well as basic facts about it, and Sec 8.3 presents testers for a natural class of properties (which includes $k$-coloring, for any constant $k$, and "max-CUT"). Note that Sec 8.2.4 provides a digest of a method that was used in Sec 8.2.3 (see Alg 8.5.2 and its analysis).

(Optional: Section 8.1 which provides a wider perspective; that is, a brief overview to the three models of testing graph properties, where only the first two models will be studied in the current and next meeting.)

Meeting 5: Testing Graph Properties in the Bounded-Degree Graph Model (9.1, 9.2.3, 9.2.4, 9.3.1, and 9.4.1)

This meeting is devoted to the "bounded-degree graph model". Section 9.1 presents the model, Sections 9.2.3 and 9.2.4 deal with testing connectivity and $t$-connectivity, and Sections 9.3.1 and 9.4.1 deal with testing Bipartiteness (i.e., 2-colorability). Note that Sections 9.4.1, 9.3.1 and 9.4.1 provide only overviews and/or partial results.


Back to the two-volume book's webpage.