Introduction to Property Testing

Oded Goldreich

Property Testing is concerned with approximate decisions, where the task is distinguishing between objects having a predetermined property and objects that are ``far'' from having this property. Typically, objects are modeled by functions, and distance between functions is measured as the fraction of the domain on which the functions differ. We consider (randomized) algorithms that may query the function at arguments of their choice, and seek algorithms of very low complexity (i.e., much lower than the size of the function's domain).
ISBN 9781107194052
Published in US in November 2017.

Publisher: Cambridge University Press
(see the publisher's page for this volume).

Material available on line:

Drafts of the book: November 2016, March 2017, and April 2017.
Drafts of individual lecture notes that serve as basis for the book chapters.
Additional (supporting) material:
- Two surveys of Dana Ron.
  - Property Testing: A Learning Theory Perspective, 2008.
  - Algorithmic and Analysis Techniques in Property Testing, 2009.
- Xin Lyu and Orr Paradise provided me with simple and natural proofs for Facts 1 and 2 stated on page 107, which address Footnote 24 (of Chapter 5) and replace Exer 5.12.
- Referred to in footnote 29 of Chapter 8: The paper of Nakar and Ron has appeared in RANDOM2018 (as On the Testability of Graph Partition Properties).
- In relation to Chapters 8-10, see Flexible models for testing graph properties (by O.G., 2018)
- In relation to Chapter 9, see Testing in the bounded-degree graph model with degree bound two (by O.G. and Laliv Tauber).
- In cont to Section 9.4.1, a possible project is obtaining a (small) complexity improvement for the rapidly mixing case. See On Testing Bipartiteness in the Bounded-Degree Graph Model.
- Reference [154], referred to in footnotes 40 and 41 of Chapter 11: On the relation between the relative earth mover distance and the variation distance (an exposition by O.G. and Dana Ron, 2016)
- Referred to in footnote 15 of Chapter 11: On the optimal analysis of the collision probability tester (an exposition of the analysis of Diakonikolas, Gouleakis, Peebles, and Price)
See also list of corrections and additions (to the print version).

See Table of Contents (of the book), and plans for a course and a mini-course.

Open Problems that has been resolved

Problem 5.8 (and related) was resolved in One-Sided Error Testing of Monomials and Affine Subspaces (by O.G. and Dana Ron, 2020).
Addressing an open problem stated at the end of Sec 8.5.1: See Hierarchy Theorems for Testing Properties in Size-Oblivious Query Complexity (by O.G., 2018).
Addressing an open problem stated at the beginning of Sec 8.5.4: See Non-adaptive vs Adaptive Queries in the Dense Graph Testing Model (by O.G. and Avi Wigderson, 2020).
Problem 9.26 was resolved by Akash Kumar, C. Seshadhri, and Andrew Stolman in Random walks and forbidden minors II: A $\poly(d\eps^{-1})$-query tester for minor-closed properties of bounded degree graphs (see my comments).
See also a related work that constructs a partition oracle in $\poly(d/\eps)$-time.
Addressing an open problem that is implicit in (the 3rd titled paragraph of) Section 9.6.3: See GSF-locality is not sufficient for proximity-oblivious testing by Isolde Adler, Noleen Kohler, and Pan Peng (2021).
Resolving the main part of Problem 13.12: See Locally Testable Codes with Constant Rate, Distance, and Locality by Irit Dinur, Shai Evra, Ron Livne, Alex Lubotzky, and Shahar Mozes (2021). (Yes, I view the question regarding the length of LTCs as the main part, since the functional length of PCPs is a less robust notion.)

Tentative preface for the book

Property testing is concerned with the design of super-fast algorithms for the ``structural analysis'' of huge amounts of data, where by structural analysis we mean an analysis aimed at unveiling global features of the data. Examples include determining whether the data as a whole has some property or estimating some global parameter of it. The focus is on properties and parameters that go beyond a simple statistics of the type that refers to the frequency of occurrence of various local patterns. The algorithms are given direct access to items of a huge data set, and determine whether this data set has some predetermined (global) property or is far from having this property. Remarkably, this decision is made by accessing a small portion of the data set.

In other words, property testing is concerned with the design of super-fast algorithms for approximate decision making, where the decision refers to properties or parameters of huge objects. In particular, we seek algorithms that only inspect relatively small portions of the huge object. Such algorithms must be randomized and can only provide approximate answers. Indeed, two salient aspects of property testing are that (1) it studies algorithms that can only read parts of the input, and (2) it focuses on algorithms that solve ``approximate decision'' problems. Both aspects are quite puzzling: What can one do without even reading the input? What does approximate decision mean?

The answer is that these two aspects are indeed linked: Approximate decision means distinguishing objects that have some predetermined property (i.e., reside in some predetermined set) from objects that are ``far'' from having the property (i.e., are far from any object having the property), where the notion of distance employed here is the relative number of different symbols in the descriptions of the objects. Such approximate decisions may be valuable in settings in which an exact decision is infeasible or very expensive or just considerably more expensive than obtaining an approximate decision.

The point is that, in many cases, approximate decision can be achieved by super-fast randomized algorithms. One well-known example is the common practice of estimating various statistics by sampling, which can be cast as a a small collection of approximate decision problems (with respect to some threshold values). Research in property testing aims to extend this useful practice to properties that cannot be cast as statistics of values that are associated with a large population. Examples in which this goal was achieved include testing properties of functions such as being a low degree polynomial, being monotone, depending on a specified number of attributes, testing properties of graphs such as being bipartite and being triangle-free, and testing properties of geometric objects and visual images such as being well-clustered and being a convex body.

Viewing the input as a function is natural in the context of algorithms that do not read their entire input. Such algorithms must probe the input at locations of their choice, and such probing can be thought of as querying a function that represents the input. The key point here is that the number of probes (or queries) is smaller than the size of the input, and so decisions are taken after seeing only a small part of the input. However, the inspected positions are not fixed but are rather chosen at random by the algorithm, possibly based on answers obtained to prior queries. Thus, in general, these algorithms may ``explore'' the input, rather than merely obtain its value at a uniformly selected sample of locations. Such exploration is most appealing when the tested input is a graph, which may be represented by a function (e.g., by its adjacency predicate), but the notion of exploration applies also in other cases.

Research in property testing may be both algorithmic and complexity theoretic. This is reflected both in its goals, which may be the design of better algorithms or the presentation of lower bound on their complexity, and its tools and techniques. Such research is related to several areas of computer science and mathematics including Combinatorics, Statistics, Computational Learning Theory, Computational Geometry, and Coding theory. Historically, property testing was closely associated with the study of Probabilistically Checkable Proofs (PCPs), and some connections do exist between the two, but property testing is not confined to PCPs (and/or to the study of ``locally testable codes'').

This book. The current book aims to provide an introduction to Property Testing, by presenting some of the main themes, results, and techniques that characterize the area and are used in it. As usual in such cases, the choice reflects a judgement of what is most adequate for presentation in the context of such an introductory course, and this selection does not reflect lack of appreciation of the omitted material but rather an opinion that it fit less well the intended purpose of the text.

In addition to the choice of material for this book, we made several choices regarding the organization of the material and the amount of inter-dependencies among its parts.

We chose to organize the material by the type of the objects and properties being tested. By the ``type of object'' we refer to the natural perception of the object; for example, whether it is most naturally perceived as a function or as a graph. Within the world of functions, the types correspond to the structure of the domain on which the function is defined (e.g., a group, a vector space, a Boolean hypercube, or a hyper-grid). The structure of the domain is often reflected by the invariances that are satisfied by the properties that we consider (e.g., affine invariance, closure under graph isomorphism, etc). Hence, our organization may be termed structure-oriented. (Possible alternatives to our organization include an organization by techniques (as in [Ron] or by complexity levels (e.g., whether the complexity is independent on the size of the object, is mildly dependent on it, is barely sub-linear, or somewhere in between).)

We chose to present the material while making as few pointers between chapters as possible. Of course, all chapters depend on the core notions that are introduced in the first chapter, but additional inter-dependencies are rare and never heavily relied upon. Hence, the ordering of the other chapters is not very important, although we preferred a specific one.

We chose to use (one-sided error) proximity-oblivious testers (POTs) whenever possible. This reflects our belief that when a tester (implicitly or explicitly) consists of repeating a POTs for a number of times that depends on the proximity parameter, one should focus on the POT itself and rely on the generic transformation from POTs to standard testers.

For sake of uniformity, $n$ always denotes the size of the object in a natural representation (which is not grossly redudent). Hence, objects are typically viewed as functions $f:[n]\to R_n$. Consequently, Boolean functions are presented as $f:\bitset^\ell\to\{0,1\}$, where $n=2^\ell$ (rather than as having domain $\{0,1\}^n$).

Comment One additional characteristic of property testing is that its positive results tend to be established by simple algorithms that are backed by a complex analysis. The simplicity of these algorithms has met the lack of respect of a few researchers, but this is a fundamental mistake on their side. The simplicity of algorithms is a virtue if one really considers using them, whereas the complexity of their analysis has no cost in terms of their applicability. Hence, simple algorithms that require a complex analysis are actually the greatest achievement that algorithmic research could hope for.

Organization and chapter summaries

All chapters rely on a few core notions that are introduced in Sections 1.3.1 and 1.3.3. Although these parts of Section 1.3 are the very minimum that is required for reading any of the subsequent chapters, we strongly recommend reading the entire first chapter before proceeeding to any other chapter. We stress that inter-dependencies between the other chapters are rare and never heavily relied upon.

Although the ordering of the chapters that follow Chapter 1 is not very important, a choice had to be made. Our choice was to start with simple properties of functions such as group homomorphism, low degree polynomials, monotonicity (with respect to various partial orders), and depending on few variables (i.e., juntas). In all these cases, the correspondence between the object and its representation is transparent: the function is the object. In contrast, when moving to graph properties, the question of representation arises in an acute manner, and three different chapters are devoted to three different representations that correspond to three different testing models. Hence, from the perspective of property testing per se, it seems to make sense to start with functions and then move to graphs.

In accordance with the above, the first cluster of chapters (Chapters 2-6) deals with testing properties of functions, whereas a later cluster (Chapters 8-10) deals with testing properties of graphs. A chapter on lower bound techniques (i.e., Chapter 7) is located in between these two clusters, since lower bounds are hardly mentioned in the first cluster, whereas they appear quite prominently in Chapters 9-10. The reason for this phenomena is that these lower bounds are used in order to justify the significantly higher complexity of some testers that are presented in Chapters 9-10. Indeed, in the context of property testing, we view lower bounds merely as justification for algorithms that may be considered to have a higher than expected complexity; the lower bounds assert that this impression is actually false, and that one cannot do significantly better.

Chapters 11-13 form a third cluster of material that is very related to the previous chapters and yet has a different flavor. Chapter 11 deals with testing properties of distributions, Chapter 12 explores a few related topics (some of which were mentioned in Section 1.3.2), and Chapter 13 reviews locally testable codes and proofs. We stress that in Chapter 11 the tested objects are fundamentally different from those considered in all the other chapters, whereas in Chapter 13 we consider objects that are artificially designed so to offer super-fast testing.

Chapter 1: The Main Themes (Approximate Decision and Sub-linear Complexity). This chapter introduces and illustrates the basic notions of property testing, emphasizing the themes of approximate decision and sub-linear complexity. The chapter starts with a discussion of the potential benefits of testing, and presents the definitions of (standard) testers and of proximity-oblivious testers (POTs). It discusses the key role of representation, points out the focus on properties that are not fully symmetric, and presents several some general observations regarding POTs, testing, and learning.

Chapter 2: Testing Linearity (Group Homomorphism). This chapter present an analysis of a linearity tester that, on input a description of two groups $G,H$ and oracle access to a function $f:G\to H$, queries the function at three points and satisfies the following conditions:

If $f$ is a homomorphism from $G$ to $H$ then the tester accepts with probability 1.
If $f$ is $\delta$-far from the set of all homomorphisms from $G$ to $H$, then the tester rejects with probability at least $\min(0.5\delta,0.1666)$.

The three queries are $x,y,x+y$, where $x$ and $y$ are selected uniformly at random in $G$.

Chapter 3: Low Degree Tests. For a finite field of prime cardinality $\F$, a degree bound $d<|\F|/2$ and number $m\in\N$, we consider the problem of testing whether a function $f:\F^m\to\F$ is a polynomial of total degree at most $d$. We present and analyze a low-degree tester that, given oracle access to $f:\F^m\to\F$, queries it at $d+2$ points and satisfies the following conditions:

If $f$ is an $m$-variate polynomial of (total) degree $d$, then the tester accepts with probability 1.
If $f$ is $\delta$-far from the set of $m$-variate polynomials of (total) degree $d$, then the tester rejects with probability at least $\min(0.5\delta,\Omega(d^{-2}))$.

The sequence of queries is generated by selecting at random $x$ and $h$ uniformly in $\F^m$, and using $x+ih$ as the $i^\xth$ query.

Chapter 4: Testing Monotonicity. For each $n$, we consider functions from a partially ordered set $D_n$ to a totally ordered set $R_n$. Such a function $f:D_n\to R_n$ is called monotone if for every $x\leq y$ in $D_n$ it holds that $f(x)\leq f(y)$, where $\leq$ denotes the partial order of $D_n$ and the total order in $R_n$. Two special cases of interest are:

Boolean functions on the Boolean Hypercube: In this case, $D_n$ is the $\ell$-dimensional Boolean hypercube (with the natural partial order), where $\ell=\log_2n$, and $R_n=\bitset$. According to this partial order, $x_1\cdots x_\ell \leq y_1\cdots y_\ell$ if and only if $x_i\leq y_i$ for every $i\in[\ell]$.
Real functions on the discrete line: In this case, $D_n=[n]$ and $R_n=\R$, both with the natural total order.

Combining these two extremes, we also consider the case of the hyper-grid domain $D_n=[m]^\ell$, for any $m,\ell\in\N$ such that $m^\ell=n$, and general ranges $R_n$. In all these cases, we obtain property testers of complexity $\poly(\eps^{-1}\log n)$. In addition, we briefly survey relatively recent developments as well as known results regarding testing convexity, submodularity, and the Lipschitz property of functions from $[m]^\ell$ to $\R$.

Chapter 5: Testing Dictatorships, Juntas, and Monomials. We consider testing three basic properties of Boolean functions of the form $f:\bitset^\ell\to\bitset$:

Dictatorship: The case where the value of $f$ depends on a single Boolean variable (i.e., $f(x)=x_i\xor\sigma$ for some $i\in[\ell]$ and $\sigma\in\bitset$).
Junta (of size $k$): The case where the value of $f$ depends on at most $k$ Boolean variables (i.e., $f(x)=f'(x_I)$ for some $k$-subset $I\subset[\ell]$ and $f':\bitset^k\to\bitset$).
Monomial (of size $k$): The case where the value of $f$ is the conjunction of exactly $k$ Boolean variables (i.e., $f(x)=\wedge_{i\in I}(x_i\xor\sigma_i)$ for some $k$-subset $I\subseteq[\ell]$ and $\sigma_1,...,\sigma_\ell\in\bitset$).

We present two different testers for dictatorship, where one generalizes to testing $k$-Juntas and the other generalizes to testing $k$-Monomials.

Chapter 6: Testing by Implicit Sampling. Building on the junta tester, we present a general methodology for constructing testers for properties of Boolean functions (of the form $f:\bitset^\ell\to\bitset$) that can be approximated by small juntas. This methodology yields testers of low query complexity for many natural properties, which contain functions that depend on relatively few relevant variables; specifically, the query complexity is related to the size of the junta and is independent of the length of the input to the function (i.e., $\ell$).

Chapter 7: Lower Bounds Techniques. We present and illustrate three techniques for proving lower bounds on the query complexity of property testers.

Showing a distribution on instances that have the property and a distribution on instances that are far from the property such that an oracle machine of low query complexity cannot distinguish these two distributions.
Showing a reduction from communication complexity. That is, by showing that a communication complexity problem of high complexity can be solved within communication complexity that is related to the query complexity of the property testing task that we are interested in.
Showing a reduction from another testing problem. That is, by showing a ``local'' reduction of a hard testing problem to the testing problem that we are interested in.

We also present simplifications of these techniques for the cases of one-sided error probability testers and non-adaptive testers.

Chapter 8: Testing Graph Properties in the Dense Graph Model. Following a general introduction to testing graph properties, this chapter focuses on the dense graph model, where graphs are represented by their adjacency matrix (predicate). The highlights of this chapter include:

A presentation of a natural class of graph properties that can each be tested within query complexity that is polynomial in the reciprocal of the proximity parameter. This class, called general graph partition problems, contains properties such as $k$-Colorability (for any $k\geq2$) and properties that refer to the density of the max-clique and to the density of the max-cut in a graph.
An exposition of the connection of testing (in this model) to Szeme\'redi's Regularity Lemma. The starting point and pivot of this exposition is the existence of constant-query (one-sided error) proximity-oblivious testers for all subgraph freeness properties.

We conclude this chapter with a taxonomy of known testers, organized according to their query complexity.

Chapter 9: Testing Graph Properties in the Bounded-Degree Graph Model. This chapter is devoted to testing graph properties in the bounded-degree graph model, where graphs are represented by their incidence lists (lumped together in an incidence function). The highlights of this chapter include:

Presenting lower and upper bounds on the complexity of testing Bipartitness; specifically, we present a $\poly(1/\eps)\cdot\tildeO({\sqrt k})$-time tester, and an $\Omega{\sqrt k})$ lower bound on the query complexity of any tester for Bipartitness.
Reviewing a quasi-poly(1/\eps)-time tester for Planarity. The result extends to testing any minor-closed property (i.e., a graph property that is preserved under the omission of edges and vertices and under edge contraction).

We concluded this chapter with a taxonomy of known testers, organized according to their query complexity.

Chapter 10: Testing Graph Properties in the General Graph Model. This chapter is devoted to testing graph properties in the general graph model, where graphs are inspected via incidence and adjacency queries, and distances between graphs are normalized by their actual size (i.e., actual number of edges). The highlights of this chapter include:

Demonstrating the derivation of testers for this model based on testers for the bounded-degree graph model.
Studying the tasks of estimating the average number of edges in a graph and sampling edges uniformly at random.

Chapter 11: Testing Properties of Distributions. This chapter provides an introduction to the study of testing properties of distributions, where the tester obtains samples of an unknown distribution (resp., samples from several unknown distributions) and is required to determine whether the distribution (resp., the tuple of distributions) has some predetermined property. We focus on the problems of testing whether an unknown distribution equals a fixed distribution and of testing equality between two unknown distributions. Our presentation is based on reductions from the general cases to some seemingly easier special cases. In addition, we also provide a brief survey of general results.

Chapter 12: Ramifications and related topics. We briefly review a few ramifications of the notion of property testers as well as related topics. The list includes tolerant testing and distance approximation; testing in presence of additional promises on the input; sample-based testers; other distance measures; local computation algorithms; and non-interactive proofs of proximity (MAPs). The different sections of this chapter can be read independently of one another.

Chapter 13: Locally Testable Codes and Proofs. We survey known results regarding locally testable codes and locally testable proofs (known as PCPs). Local testability refers to approximately testing large objects based on a very small number of probes, each retrieving a single bit in the representation of the object. This yields super-fast approximate-testing of the corresponding property (i.e., be a codeword or a valid proof). In terms of property testing, locally testable codes are error correcting codes such that the property of being a codeword can be tested within low query complexity. As for locally testable proofs (PCPs), these can be viewed as massively parametrized properties that are testable within low query complexity such that the parameterized property is non-empty if and only if the corresponding parameter is in a predetermined set (of ``valid statements''). Our first priority is minimizing the number of probes, and we focus on the case that this number is a constant. In this case (of a constant number of probes), we aim at minimizing the length of the constructs. That is, we seek locally testable codes and proofs of short length.

Appendix A: Probabilistic Preliminaries. This appendix presents background from probability theory, which is used extensively throughout the book. This background and preliminaries include conventions regarding random variables, basic notions and facts, and three useful probabilistic inequalities (i.e., Markov's Inequality, Chebyshev's Inequality, and Chernoff Bound).

Appendix B: A Mini-Compendium of General Results. This appendix restates several general results that were presented in prior chapters, including deriving standard testers from POTs; positive results on the algebra of property testing; reducing testing to learning; the randomness complexity of testers; archetypical application of self-correction; and the effect of local reductions.

Appendix C: An Index of Specific Results. This appendix provides an index to all results regarding specific properties that were presented in this book.

Preface
Chapter Summaries
Notation
Standard Notation; Specific notation used extensively
Acknowledgements

1. The Main Themes: Approximate Decision and Sub-linear Complexity
1.1 Introduction: 1.1.1 Property testing at a glance, 1.1.2 On the potential benefits of property testers, 1.1.3 On the flavor of property testing research, 1.1.4 Organization and some notations
1.2 Approximate decisions: 1.2.1 A detour: approximate search problems, 1.2.2 Property testing: approximate decision problems, 1.2.3 Property testing: sub-linear complexity, 1.2.4 Symmetries and invariants, 1.2.5 Objects and representation
1.3 Notions, definitions, goals, and basic observations: 1.3.1 Basics, 1.3.2 Ramifications, 1.3.3 Proximity-oblivious testers (POTs), 1.3.4 The algebra of property testing, 1.3.5 Testing via learning
1.4 Historical notes
1.5 Suggested reading and exercises
1.6 Digest: The most important points

2. Testing Linearity (Group Homomorphism)
2.1 Preliminaries
2.2 The tester
2.3 Chapter notes

3. Low Degree Tests
3.1 A brief introduction
3.2 A kind of intuition (which may be skipped)
3.3 Background
3.4 The tester
3.5 Chapter notes

4. Testing Monotonicity
4.1 Introduction
4.2 Boolean functions on the Boolean Hypercube: 4.2.1 The edge test, 4.2.2 Path tests
4.3 Multi-valued functions on the discrete line: 4.3.1 A tester based on binary search, 4.3.2 Other testers
4.4 Multi-valued functions on the Hypergrid: 4.4.1 Dimension reduction, 4.4.2 Range reduction
4.5 Chapter notes

5. Testing Dictatorships, Juntas, and Monomials
5.1 Introduction
5.2 Testing dictatorship via self-correction: 5.2.1 The tester, 5.2.2 Testing monomials, 5.2.3 The self-correction paradigm: an abstraction
5.3 Testing juntas
5.4 Chapter notes

6. Testing by Implicit Sampling
6.1 Introduction
6.2 Testing subsets of $k$-juntas
6.3 Extension to properties approximated by subsets of $k$-juntas
6.4 Chapter notes

7. Lower Bounds Techniques
7.1 Introduction
7.2 Indistinguishability of distributions: 7.2.1 The actual method, 7.2.2 Illustrating the application of the method, 7.2.3 Further reflections
7.3 Reduction from Communication Complexity: 7.3.1 Communication Complexity, 7.3.2 The methodology, 7.3.3 Illustrating the application of the methodology
7.4 Reduction among testing problems
7.5 Lower bounds for restricted testers: 7.5.1 One-sided error testers, 7.5.2 Non-adaptive testers
7.6 Chapter notes

8. Testing Graph Properties in the Dense Graph Model
8.1 The general context: Introduction to testing graph properties: 8.1.1 Basic background, 8.1.2 Three Models of Testing Graph Properties
8.2 The Dense Graph Model: Some basics: 8.2.1 The actual definition, 8.2.2 Abuses of the model: Trivial and sparse properties, 8.2.3 Testing degree regularity, 8.2.4 Digest: Levin's economical work investment strategy
8.3 Graph Partition Problems: 8.3.1 Testing Bipartiteness, 8.3.2 The actual definition and the general result
8.4 Connection to Szemeredi's Regularity Lemma: 8.4.1 The Regularity Lemma, 8.4.2 Subgraph freeness, 8.4.3 The structure of properties that have size-oblivious testers
8.5 A Taxonomy of the known results
8.6 Chapter notes: 8.6.1 Historical perspective and credits, 8.6.2 Testing versus other forms of approximation, 8.6.3 A contrast with recognizing graph properties, 8.6.4 Exercises

9. Testing Graph Properties in the Bounded-Degree Graph Model
9.1 The Bounded-Degree Model: Definitions and issues
9.2 Testing by a local search: 9.2.1 Testing subgraph freeness, 9.2.2 Testing degree regularity, 9.2.3 Testing connectivity, 9.2.4 Testing $t$-connectivity (overview and one detail), 9.2.5 Testing cycle-freeness (with two-sided error)
9.3 Lower bounds: 9.3.1 Bipartitness, 9.3.2 Applications to other properties, 9.3.3 Linear lower bounds
9.4 Testing by random walks: 9.4.1 Testing Bipartiteness, 9.4.2 One-sided error tester for Cycle-freeness
9.5 Testing by implementing and utilizing partition oracles: 9.5.1 The simpler implementation, 9.5.2 The better implementation
9.6 A Taxonomy of the known results
9.7 Chapter notes: 9.7.1 Historical perspective and credits 9.7.2 Directed graphs 9.7.3 Exercises

10. Testing Graph Properties in the General Graph Model
10.1 The General Graph Model: Definitions and issues
10.2 On obtaining testers for the current model: 10.2.1 An explicit adaptation: the case of connectivity, 10.2.2 Using a reduction: the case of Bipartiteness
10.3 Estimating the average degree and selecting random edges: 10.3.1 Lower bounds, 10.3.2 Algorithms (Bucketing vertices according to their degree and Sorting vertices according to their degree)
10.4 Using adjacency queries: the case of Bipartiteness
10.5 Chapter notes: 10.5.1 Gaps between the general graph model and the bounded-degree model, 10.5.2 History and credits, 10.5.3 Reflections, 10.5.4 Exercises

11. Testing Properties of Distributions
11.1 The model: 11.1.1 Testing properties of single distributions, 11.1.2 Testing properties of pairs of distributions, 11.1.3 Label-invariant properties, 11.1.4 Organization
11.2 Testing equality to a fixed distribution: 11.2.1 The collision probability tester and its analysis, 11.2.2 The general case (treated by a reduction to testing uniformity), 11.2.3 A lower bound
11.3 Testing equality between two unknown distributions: 11.3.1 Detour: Poisson Distributions, 11.3.2 The actual algorithm and its analysis, 11.3.3 Applications: Reduction to the case of small norms
11.4 On the complexity of testing properties of distributions
11.5 Chapter notes 11.5.1 History and credits 11.5.2 Exercise

12. Ramifications and related topics
12.1 Tolerant testing and distance approximation
12.2 Additional promises on the input
12.3 Sample based testers
12.4 Testing with respect to other distance measures
12.5 Local computation algorithms: 12.5.1 Definitions, 12.5.2 Finding huge structures, 12.5.3 Local reconstruction
12.6 Non-interactive proofs of proximity (MAPs)
12.7 Chapter notes: 12.7.1 Historical notes, 12.7.2 Massively parameterized properties, 12.7.3 Exercises

13. Locally Testable Codes and Proofs
13.1 Introduction
13.2 Definitions: 13.2.1 Codeword testers, 13.2.2 Proof testers, 13.2.3 Ramifications and relation to property testing, 13.2.4 On relating locally testable codes and proofs
13.3 Results and Ideas: 13.3.1 The mere existence of locally testable codes and proofs (The Hadamard Code is locally testable and The Hadamard-Based PCP of ALMSS), 13.3.2 Locally testable codes and proofs of polynomial length (Locally testable codes of almost quadratic length and Locally testable proofs of polynomial length (The PCP Theorem)), 13.3.3 Locally testable codes and proofs of nearly linear length (Local testability with nearly linear length and The gap amplification method)
13.4 Chapter notes: 13.4.1 Historical notes, 13.4.2 On obtaining super-fast testers, 13.4.3 The alternative regime: LTCs of linear length, 13.4.4 Locally Decodable Codes, 13.4.5 Exercises

Apdx A. Probabilistic Preliminaries
A.1 Notational Conventions
A.2 Some basic notions and facts
A.3 Basic facts regarding expectation and variance
A.4 Three Inequalities: A.4.1 Markov's Inequality, A.4.2 Chebyshev's Inequality, A.4.3 Chernoff Bound, A.4.4 Pairwise independent versus totally independent sampling

Apdx B. A Mini-Compendium of General Results

Apdx C. An Index of Specific Results

Bibliography Index

Plans for a course and a mini-course based on the book

Suggestions for optional reading are provided in square brackets.

Plan for a 24-hour course based on the book

Introduction, motivation, and mind-frame (Sec 1.1-1.2);
Main notions and general results (Sec 1.3.1, 1.3.3, and 1.3.5);
Testing Linearity (Chap 2);
Low Degree Tests (Sec 3.1 and 3.4) [background 3.2-3.3];
Testing monotonicity on the Boolean hypercube (4.2.1) and on the line (4.3.1) [and beyond 4.4.0];
Testing Dictatorship (5.2.1 and 5.2.3) and Juntas (5.3);
Testing by Implicit Sampling (sec 6.2);
Lower Bounds Techniques (Sec 7.2 and 7.3);
Testing Graph Properties in the Dense Graph Model (8.2 and 8.3 [and 8.4]);
Testing Graph Properties in the Bounded-Degree Graph Model (9.1, 9.2.3, 9.2.4, 9.3.1, and 9.4.1 [and 9.5]);
Testing Graph Properties in the General Graph Model (10.1, 10.2.2, and 10.3.2.1);
Testing Properties of Distributions (11.1 and 11.2).

Plan for a 10-hour mini-course based on the book

The mind-frame (1.2) and main notions (1.3.1 and 1.3.3);
Testing Linearity (Chap 2);
Testing Dictatorship (5.2.1 and 5.2.3);
Testing Graph Properties in the Dense Graph Model (8.2 and 8.3);
Testing Graph Properties in the Bounded-Degree Graph Model (9.1, 9.2.3, 9.2.4, 9.3.1, and 9.4.1).

List of corrections and additions

Typo on page xxiii: $S \choose0$ is the (singleton) set that contains the empty set (rather than is the empty set itself).
Typo on page xxiv, line 1: In both tilde notations, the extra polylogarithmic factor is of $g(n)$ not of $n$.
Typo on page 12, item 1: denoted should be denote.
Unfortunate omission on page 17 (paragraph on nonadditivity): In general, the final decision (represented by $D$), may also depend on $n$ and $\epsilon$. (Unfortunately, we keep forgetting this aspect...)
Typo on page 95, Footnote 7: The references should be to Steps 2c and 2d, respectively. [Orr Paradise]
Typo on page 170, line 3: The definition of the degree of $u$ in $G$ should read $d_G(u)=|\{v:\{u,v\}\in E\}|$. [Orr Paradise]
Errors on pages 175-176 (sec 8.2.4). The expression (for the probability that $q(\omega)$ is large) given at the bottom of page 175 is off by a constant factor. One constant factor of two is due to Footnote 16, which starts by assuming that $B_0=\Omega$. This assumption can be justified (only) by throwing away all elements having a $q$-value that does not exceed $\eps/2$, and resetting $\ep$ to $\eps/2$. Furthermore, the second displayed inequality holds only for $B_{i-1}$'s with $i\geq2$, whereas for $B_0$ the bound is a trivial value of $2\e$. Hence, $B_i$ has to be redefined as all elements with $q$-value above $2^{i-2}\eps$, and in this case all goes fine.
Update for page 185: The paper of Nakar and Ron, referred to in footnote 29, has appeared in RANDOM2018 (as On the Testability of Graph Partition Properties).
Thm 8.13 (p 186) was improved, for the case of $t\geq3$, to the optimal value of $\tildeO(1/\epsilon)$. This improvement is due to Sohler [FOCS 2012], who used the non-suggestive title Almost Optimal Canonical Property Testers for Satisfiability.
The comment at the end of Sec 8.5.1 (p 200) has been addressed by my work Hierarchy Theorems for Testing Properties in Size-Oblivious Query Complexity.
The guideline for Exercise 8.7 refers to edges that violate the 2-coloring defined based on a 2-partition of $U$, while ignoring the 2-colorings that are not in agreement with any 2-partition of U. This would have been fine if each such disagreement were to be charged at $t=|S|/|U|$ units. We get an analogous effect by using $t$ (independent) copies of $U$ and considering the subgraph induced by these $t$ cookies as well as the (single copy of) $S$. For each copy of $U$ and each of the $|S|-1$ perfect matchings of $S$, wvhp, we either see $\Omega(\eps|S|)$ violating edges in the matching or $\Omega(\eps|S|/|U|)$ disagreements between colors of $S$ and the 2-partition of $U$. [Reut Levi, Feb 2024]
On Page 223: The bound of Thm 9.9 has been improved to an almost optimal level by Forster et al (SODA2020).
At the end of Section 11.5.1 (page 351), the work of [70] is mentioned with respect to introducing a model of conditional sampling. The same model was suggested independently by Sourav Chakraborty, Eldar Fischer, Arie Matsliah, and Yonatan Goldhirsh in their work On the Power of Conditional Samples in Distribution Testing [SICOMP, 45(4), 1261-1296, 2016].
In Step 2 on page 292, the same randomness is used for all $i$'s. The same holds also wrt Step 3, although the notation suggests differently. [Orr Paradise]
Page 307, Footnote 6: The approximation should be to a factor of $1+0.1\epsilon$ (and not to a factor of $1+0.1\max(1,(n \cdot p(i))^{-1})\cdot\epsilon$). Indeed, this matters only for $p(i) \in [0.1\epsilon/n,1/n]$.

See additional lists of typos by Orr Paradise and Yoav Ben-Dov.

Prior lecture notes

The book is based on lecture notes written towards teaching an introductory course on the subject (in Fall 2015). These earlier notes contain several errors that were fixed in later revisions, and so one better use the full draft dated March 2017. The topics that were covered by the notes are:

A first lecture: themes, notions, definitions, and goals.
Testing algebraic properties:
Testing properties of functions over product domains:
- monotonicity
- properties of Boolean functions such as dictatorship, juntas, and monomials,
- and testing by implicit sampling;
lower bound techniques
Testing graph properties
- in the adjacency matrix (a.k.a dense graph) model,
- in the bounded-degree graph model,
- and in the general graph model;
Testing properties of distributions
Add'l topics: Tolerant testers, sample-based testers, distribution-free testers, local computation algorithms (i.e., finding huge structures and reconstruction), and MAPs.
Locally testable codes and proofs (an overview).
Appendix: some probabilistic preliminaries

Related surveys can be found in the collection Property Testing: Current Research and Surveys (Springer, LNCS 6390, 2010).

Two specific texts that were converted (and extended) into the foregoing lecture notes are --

Hence, these survers are superseeded by the lecture notes.

Back to homepage.