This is the website of the forthcoming book
titled **Introduction to Property Testing**.

The book will be published by Cambridge University Press. Currently, the text is in the copyediting stage (May 2017).

**Material available on line**:

- Drafts of the book: November 2016, March 2017, and April 2017.
- Drafts of individual lecture notes that serve as basis for the book chapters.

Property Testing is concerned with approximate decisions, where the task is distinguishing between objects having a predetermined property and objects that are ``far'' from having this property. Typically, objects are modeled by functions, and distance between functions is measured as the fraction of the domain on which the functions differ. We consider (randomized) algorithms that may query the function at arguments of their choice, and seek algorithms of very low complexity (i.e., much lower than the size of the function's domain).

In other words, property testing is concerned with the design
of *super-fast algorithms for approximate decision making*,
where the decision refers to properties or parameters of huge objects.
In particular, we seek algorithms that
only inspect relatively small portions of the huge object.
Such algorithms must be randomized and can only provide approximate answers.
Indeed, two salient aspects of property testing are that
(1) it studies algorithms that can only read parts of the input, and
(2) it focuses on algorithms that solve ``approximate decision'' problems.
Both aspects are quite puzzling:
What can one do without even reading the input?
What does approximate decision mean?

The answer is that these two aspects are indeed linked:
*Approximate decision* means distinguishing objects
that have some predetermined property
(i.e., reside in some predetermined set) from
objects that are ``far'' from having the property
(i.e., are far from any object having the property),
where the notion of distance employed here is the relative number
of different symbols in the descriptions of the objects.
Such approximate decisions may be valuable in settings
in which an exact decision is infeasible or very expensive
or just considerably more expensive than obtaining an
approximate decision.

The point is that, in many cases, approximate decision
can be achieved by *super-fast randomized algorithms*.
One well-known example is the common practice of estimating
various statistics by sampling, which can be cast as a
a small collection of approximate decision problems
(with respect to some threshold values).
Research in property testing aims to extend this useful practice
to properties that cannot be cast as statistics of values
that are associated with a large population.
Examples in which this goal was achieved include testing
properties of functions such as being a low degree polynomial,
being monotone, depending on a specified number of attributes,
testing properties of graphs such as being bipartite and
being triangle-free, and testing properties of geometric
objects and visual images such as being well-clustered
and being a convex body.

Viewing the input as a function is natural in the context of algorithms that do not read their entire input. Such algorithms must probe the input at locations of their choice, and such probing can be thought of as querying a function that represents the input. The key point here is that the number of probes (or queries) is smaller than the size of the input, and so decisions are taken after seeing only a small part of the input. However, the inspected positions are not fixed but are rather chosen at random by the algorithm, possibly based on answers obtained to prior queries. Thus, in general, these algorithms may ``explore'' the input, rather than merely obtain its value at a uniformly selected sample of locations. Such exploration is most appealing when the tested input is a graph, which may be represented by a function (e.g., by its adjacency predicate), but the notion of exploration applies also in other cases.

Research in property testing may be both algorithmic and complexity theoretic. This is reflected both in its goals, which may be the design of better algorithms or the presentation of lower bound on their complexity, and its tools and techniques. Such research is related to several areas of computer science and mathematics including Combinatorics, Statistics, Computational Learning Theory, Computational Geometry, and Coding theory. Historically, property testing was closely associated with the study of Probabilistically Checkable Proofs (PCPs), and some connections do exist between the two, but property testing is not confined to PCPs (and/or to the study of ``locally testable codes'').

**This book.**
The current book aims to provide an introduction to Property Testing,
by presenting some of the main themes, results, and techniques
that characterize the area and are used in it.
As usual in such cases,
*the choice reflects a judgement of what is most adequate
for presentation in the context of such an introductory course*,
and this selection does not reflect lack of appreciation of
the omitted material but rather an opinion that it fit
less well the intended purpose of the text.

In addition to the choice of material for this book, we made several choices regarding the organization of the material and the amount of inter-dependencies among its parts.

We chose to organize the material by the type of the objects and properties being tested. By the ``type of object'' we refer to the natural perception of the object; for example, whether it is most naturally perceived as a function or as a graph. Within the world of functions, the types correspond to the structure of the domain on which the function is defined (e.g., a group, a vector space, a Boolean hypercube, or a hyper-grid). The structure of the domain is often reflected by the invariances that are satisfied by the properties that we consider (e.g., affine invariance, closure under graph isomorphism, etc). Hence, our organization may be termed structure-oriented. (Possible alternatives to our organization include an organization by techniques (as in [Ron] or by complexity levels (e.g., whether the complexity is independent on the size of the object, is mildly dependent on it, is barely sub-linear, or somewhere in between).)

We chose to present the material while making as few pointers between chapters as possible. Of course, all chapters depend on the core notions that are introduced in the first chapter, but additional inter-dependencies are rare and never heavily relied upon. Hence, the ordering of the other chapters is not very important, although we preferred a specific one.

We chose to use (one-sided error) proximity-oblivious testers (POTs) whenever possible. This reflects our belief that when a tester (implicitly or explicitly) consists of repeating a POTs for a number of times that depends on the proximity parameter, one should focus on the POT itself and rely on the generic transformation from POTs to standard testers.

For sake of uniformity, $n$ always denotes the size of the object in a natural representation (which is not grossly redudent). Hence, objects are typically viewed as functions $f:[n]\to R_n$. Consequently, Boolean functions are presented as $f:\bitset^\ell\to\{0,1\}$, where $n=2^\ell$ (rather than as having domain $\{0,1\}^n$).

Although the ordering of the chapters that follow Chapter 1
is not very important, a choice had to be made.
Our choice was to start with simple properties of functions
such as group homomorphism, low degree polynomials,
monotonicity (with respect to various partial orders),
and depending on few variables (i.e., juntas).
In all these cases, the correspondence between the object
and its representation is transparent: *the function is the object*.
In contrast, when moving to graph properties,
the question of representation arises in an acute manner,
and three different chapters are devoted to three different
representations that correspond to three different testing models.
Hence, from the perspective of property testing per se,
it seems to make sense to start with functions
and then move to graphs.

In accordance with the above, the first cluster of chapters (Chapters 2-6) deals with testing properties of functions, whereas a later cluster (Chapters 8-10) deals with testing properties of graphs. A chapter on lower bound techniques (i.e., Chapter 7) is located in between these two clusters, since lower bounds are hardly mentioned in the first cluster, whereas they appear quite prominently in Chapters 9-10. The reason for this phenomena is that these lower bounds are used in order to justify the significantly higher complexity of some testers that are presented in Chapters 9-10. Indeed, in the context of property testing, we view lower bounds merely as justification for algorithms that may be considered to have a higher than expected complexity; the lower bounds assert that this impression is actually false, and that one cannot do significantly better.

Chapters 11-13 form a third cluster of material that is very related to the previous chapters and yet has a different flavor. Chapter 11 deals with testing properties of distributions, Chapter 12 explores a few related topics (some of which were mentioned in Section 1.3.2), and Chapter 13 reviews locally testable codes and proofs. We stress that in Chapter 11 the tested objects are fundamentally different from those considered in all the other chapters, whereas in Chapter 13 we consider objects that are artificially designed so to offer super-fast testing.

**Chapter 1: The Main Themes
(Approximate Decision and Sub-linear Complexity).**
This chapter introduces and illustrates the basic notions
of property testing, emphasizing the themes
of *approximate decision* and *sub-linear complexity*.
The chapter starts with a discussion of the potential benefits of testing,
and presents the definitions of (standard) testers
and of proximity-oblivious testers (POTs).
It discusses the key role of representation,
points out the focus on properties that are not fully symmetric,
and presents several some general observations regarding POTs,
testing, and learning.

**Chapter 2: Testing Linearity (Group Homomorphism).**
This chapter present an analysis of a linearity tester that,
on input a description of two groups $G,H$
and oracle access to a function $f:G\to H$,
queries the function at three points
and satisfies the following conditions:

- If $f$ is a homomorphism from $G$ to $H$ then the tester accepts with probability 1.
- If $f$ is $\delta$-far from the set of all homomorphisms from $G$ to $H$, then the tester rejects with probability at least $\min(0.5\delta,0.1666)$.

**Chapter 3: Low Degree Tests.**
For a finite field of prime cardinality $\F$,
a degree bound $d<|\F|/2$ and number $m\in\N$,
we consider the problem of testing whether a function $f:\F^m\to\F$
is a polynomial of total degree at most $d$.
We present and analyze a low-degree tester that,
given oracle access to $f:\F^m\to\F$, queries it
at $d+2$ points and satisfies the following conditions:

- If $f$ is an $m$-variate polynomial of (total) degree $d$, then the tester accepts with probability 1.
- If $f$ is $\delta$-far from the set of $m$-variate polynomials of (total) degree $d$, then the tester rejects with probability at least $\min(0.5\delta,\Omega(d^{-2}))$.

**Chapter 4: Testing Monotonicity.**
For each $n$,
we consider functions from a partially ordered set $D_n$
to a totally ordered set $R_n$.
Such a function $f:D_n\to R_n$ is called monotone
if for every $x\leq y$ in $D_n$ it holds that $f(x)\leq f(y)$,
where $\leq$ denotes the partial order of $D_n$
and the total order in $R_n$.
Two special cases of interest are:

- Boolean functions on the Boolean Hypercube: In this case, $D_n$ is the $\ell$-dimensional Boolean hypercube (with the natural partial order), where $\ell=\log_2n$, and $R_n=\bitset$. According to this partial order, $x_1\cdots x_\ell \leq y_1\cdots y_\ell$ if and only if $x_i\leq y_i$ for every $i\in[\ell]$.
- Real functions on the discrete line: In this case, $D_n=[n]$ and $R_n=\R$, both with the natural total order.

**Chapter 5: Testing Dictatorships, Juntas, and Monomials.**
We consider testing three basic properties of Boolean functions
of the form $f:\bitset^\ell\to\bitset$:

- Dictatorship: The case where the value of $f$ depends on a single Boolean variable (i.e., $f(x)=x_i\xor\sigma$ for some $i\in[\ell]$ and $\sigma\in\bitset$).
- Junta (of size $k$): The case where the value of $f$ depends on at most $k$ Boolean variables (i.e., $f(x)=f'(x_I)$ for some $k$-subset $I\subset[\ell]$ and $f':\bitset^k\to\bitset$).
- Monomial (of size $k$): The case where the value of $f$ is the conjunction of exactly $k$ Boolean variables (i.e., $f(x)=\wedge_{i\in I}(x_i\xor\sigma_i)$ for some $k$-subset $I\subseteq[\ell]$ and $\sigma_1,...,\sigma_\ell\in\bitset$).

**Chapter 6: Testing by Implicit Sampling.**
Building on the junta tester, we present a general methodology
for constructing testers for properties of Boolean functions
(of the form $f:\bitset^\ell\to\bitset$)
that can be approximated by small juntas.
This methodology yields testers of low query complexity
for many natural properties, which contain functions
that depend on relatively few relevant variables;
specifically, the query complexity
is related to the size of the junta and is independent
of the length of the input to the function (i.e., $\ell$).

**Chapter 7: Lower Bounds Techniques.**
We present and illustrate three techniques for proving lower
bounds on the query complexity of property testers.

- Showing a distribution on instances that have the property and a distribution on instances that are far from the property such that an oracle machine of low query complexity cannot distinguish these two distributions.
- Showing a reduction from communication complexity. That is, by showing that a communication complexity problem of high complexity can be solved within communication complexity that is related to the query complexity of the property testing task that we are interested in.
- Showing a reduction from another testing problem. That is, by showing a ``local'' reduction of a hard testing problem to the testing problem that we are interested in.

**Chapter 8: Testing Graph Properties in the Dense Graph Model.**
Following a general introduction to testing graph properties,
this chapter focuses on the dense graph model,
where graphs are represented by their adjacency matrix (predicate).
The highlights of this chapter include:

- A presentation of a natural class of graph properties that can each be tested within query complexity that is polynomial in the reciprocal of the proximity parameter. This class, called general graph partition problems, contains properties such as $k$-Colorability (for any $k\geq2$) and properties that refer to the density of the max-clique and to the density of the max-cut in a graph.
- An exposition of the connection of testing (in this model) to Szeme\'redi's Regularity Lemma. The starting point and pivot of this exposition is the existence of constant-query (one-sided error) proximity-oblivious testers for all subgraph freeness properties.

**Chapter 9: Testing Graph Properties
in the Bounded-Degree Graph Model.**
This chapter is devoted to testing graph properties
in the bounded-degree graph model,
where graphs are represented by their incidence lists
(lumped together in an incidence function).
The highlights of this chapter include:

- Presenting lower and upper bounds on the complexity of testing Bipartitness; specifically, we present a $\poly(1/\eps)\cdot\tildeO({\sqrt k})$-time tester, and an $\Omega{\sqrt k})$ lower bound on the query complexity of any tester for Bipartitness.
- Reviewing a quasi-poly(1/\eps)-time tester for Planarity. The result extends to testing any minor-closed property (i.e., a graph property that is preserved under the omission of edges and vertices and under edge contraction).

**Chapter 10: Testing Graph Properties in the General Graph Model.**
This chapter is devoted to testing graph properties
in the general graph model,
where graphs are inspected via incidence and adjacency queries,
and distances between graphs are normalized by their
actual size (i.e., actual number of edges).
The highlights of this chapter include:

- Demonstrating the derivation of testers for this model based on testers for the bounded-degree graph model.
- Studying the tasks of estimating the average number of edges in a graph and sampling edges uniformly at random.

**Chapter 11: Testing Properties of Distributions.**
This chapter provides an introduction to the study of testing
properties of distributions, where the tester obtains
samples of an unknown distribution
(resp., samples from several unknown distributions)
and is required to determine whether the distribution
(resp., the tuple of distributions)
has some predetermined property.
We focus on the problems of testing whether
an unknown distribution equals a fixed distribution
and of testing equality between two unknown distributions.
Our presentation is based on reductions from the general cases
to some seemingly easier special cases.
In addition, we also provide a brief survey of general results.

**Chapter 12: Ramifications and related topics.**
We briefly review a few ramifications of the notion
of property testers as well as related topics.
The list includes
tolerant testing and distance approximation;
testing in presence of additional promises on the input;
sample-based testers;
other distance measures;
local computation algorithms; and
non-interactive proofs of proximity (MAPs).
The different sections of this chapter
can be read independently of one another.

**Chapter 13: Locally Testable Codes and Proofs.**
We survey known results regarding locally testable codes
and locally testable proofs (known as PCPs).
Local testability refers to approximately testing
large objects based on a very small number of probes,
each retrieving a single bit in the representation of the object.
This yields super-fast approximate-testing of the corresponding property
(i.e., be a codeword or a valid proof).
In terms of property testing, locally testable codes are error correcting
codes such that the property of being a codeword can be tested
within low query complexity. As for locally testable proofs (PCPs),
these can be viewed as massively parametrized properties that are testable
within low query complexity such that the parameterized property
is non-empty if and only if the corresponding parameter
is in a predetermined set (of ``valid statements'').
Our first priority is minimizing the number of probes,
and we focus on the case that this number is a constant.
In this case (of a constant number of probes),
we aim at minimizing the length of the constructs.
That is, we seek locally testable codes and proofs of short length.

**Appendix A: Probabilistic Preliminaries.**
This appendix presents background from probability theory,
which is used extensively throughout the book.
This background and preliminaries include
conventions regarding random variables,
basic notions and facts,
and three useful probabilistic inequalities
(i.e., Markov's Inequality, Chebyshev's Inequality, and Chernoff Bound).

**Appendix B: A Mini-Compendium of General Results.**
This appendix restates several general results
that were presented in prior chapters,
including
deriving standard testers from POTs;
positive results on the algebra of property testing;
reducing testing to learning;
the randomness complexity of testers;
archetypical application of self-correction;
and the effect of local reductions.

**Appendix C: An Index of Specific Results.**
This appendix provides an index to all results regarding
specific properties that were presented in this book.

Standard Notation; Specific notation used extensively

**1. The Main Themes: Approximate Decision and Sub-linear Complexity**

1.1 Introduction:
1.1.1 Property testing at a glance,
1.1.2 On the potential benefits of property testers,
1.1.3 On the flavor of property testing research,
1.1.4 Organization and some notations

1.2 Approximate decisions:
1.2.1 A detour: approximate search problems,
1.2.2 Property testing: approximate decision problems,
1.2.3 Property testing: sub-linear complexity,
1.2.4 Symmetries and invariants,
1.2.5 Objects and representation

1.3 Notions, definitions, goals, and basic observations:
1.3.1 Basics,
1.3.2 Ramifications,
1.3.3 Proximity-oblivious testers (POTs),
1.3.4 The algebra of property testing,
1.3.5 Testing via learning

1.4 Historical notes

1.5 Suggested reading and exercises

1.6 Digest: The most important points

**2. Testing Linearity (Group Homomorphism) **

2.1 Preliminaries

2.2 The tester

2.3 Chapter notes

**3. Low Degree Tests **

3.1 A brief introduction

3.2 A kind of intuition (which may be skipped)

3.3 Background

3.4 The tester

3.5 Chapter notes

**4. Testing Monotonicity **

4.1 Introduction

4.2 Boolean functions on the Boolean Hypercube:
4.2.1 The edge test,
4.2.2 Path tests

4.3 Multi-valued functions on the discrete line:
4.3.1 A tester based on binary search,
4.3.2 Other testers

4.4 Multi-valued functions on the Hypergrid:
4.4.1 Dimension reduction,
4.4.2 Range reduction

4.5 Chapter notes

**5. Testing Dictatorships, Juntas, and Monomials **

5.1 Introduction

5.2 Testing dictatorship via self-correction:
5.2.1 The tester,
5.2.2 Testing monomials,
5.2.3 The self-correction paradigm: an abstraction

5.3 Testing juntas

5.4 Chapter notes

**6. Testing by Implicit Sampling **

6.1 Introduction

6.2 Testing subsets of $k$-juntas

6.3 Extension to properties approximated by subsets of $k$-juntas

6.4 Chapter notes

**7. Lower Bounds Techniques **

7.1 Introduction

7.2 Indistinguishability of distributions:
7.2.1 The actual method,
7.2.2 Illustrating the application of the method,
7.2.3 Further reflections

7.3 Reduction from Communication Complexity:
7.3.1 Communication Complexity,
7.3.2 The methodology,
7.3.3 Illustrating the application of the methodology

7.4 Reduction among testing problems

7.5 Lower bounds for restricted testers:
7.5.1 One-sided error testers,
7.5.2 Non-adaptive testers

7.6 Chapter notes

**8. Testing Graph Properties in the Dense Graph Model **

8.1 The general context: Introduction to testing graph properties:
8.1.1 Basic background,
8.1.2 Three Models of Testing Graph Properties

8.2 The Dense Graph Model: Some basics:
8.2.1 The actual definition,
8.2.2 Abuses of the model: Trivial and sparse properties,
8.2.3 Testing degree regularity,
8.2.4 Digest: Levin's economical work investment strategy

8.3 Graph Partition Problems:
8.3.1 Testing Bipartiteness,
8.3.2 The actual definition and the general result

8.4 Connection to Szemeredi's Regularity Lemma:
8.4.1 The Regularity Lemma,
8.4.2 Subgraph freeness,
8.4.3 The structure of properties that have size-oblivious testers

8.5 A Taxonomy of the known results

8.6 Chapter notes:
8.6.1 Historical perspective and credits,
8.6.2 Testing versus other forms of approximation,
8.6.3 A contrast with recognizing graph properties,
8.6.4 Exercises

**9. Testing Graph Properties in the Bounded-Degree Graph Model **

9.1 The Bounded-Degree Model: Definitions and issues

9.2 Testing by a local search:
9.2.1 Testing subgraph freeness,
9.2.2 Testing degree regularity,
9.2.3 Testing connectivity,
9.2.4 Testing $t$-connectivity (overview and one detail),
9.2.5 Testing cycle-freeness (with two-sided error)

9.3 Lower bounds:
9.3.1 Bipartitness,
9.3.2 Applications to other properties,
9.3.3 Linear lower bounds

9.4 Testing by random walks:
9.4.1 Testing Bipartiteness,
9.4.2 One-sided error tester for Cycle-freeness

9.5 Testing by implementing and utilizing partition oracles:
9.5.1 The simpler implementation,
9.5.2 The better implementation

9.6 A Taxonomy of the known results

9.7 Chapter notes:
9.7.1 Historical perspective and credits
9.7.2 Directed graphs
9.7.3 Exercises

**10. Testing Graph Properties in the General Graph Model **

10.1 The General Graph Model: Definitions and issues

10.2 On obtaining testers for the current model:
10.2.1 An explicit adaptation: the case of connectivity,
10.2.2 Using a reduction: the case of Bipartiteness

10.3 Estimating the average degree and selecting random edges:
10.3.1 Lower bounds,
10.3.2 Algorithms (Bucketing vertices according to their degree
and Sorting vertices according to their degree)

10.4 Using adjacency queries: the case of Bipartiteness

10.5 Chapter notes:
10.5.1 Gaps between the general graph model and the bounded-degree model,
10.5.2 History and credits,
10.5.3 Reflections,
10.5.4 Exercises

**11. Testing Properties of Distributions **

11.1 The model:
11.1.1 Testing properties of single distributions,
11.1.2 Testing properties of pairs of distributions,
11.1.3 Label-invariant properties,
11.1.4 Organization

11.2 Testing equality to a fixed distribution:
11.2.1 The collision probability tester and its analysis,
11.2.2 The general case (treated by a reduction to testing uniformity),
11.2.3 A lower bound

11.3 Testing equality between two unknown distributions:
11.3.1 Detour: Poisson Distributions,
11.3.2 The actual algorithm and its analysis,
11.3.3 Applications: Reduction to the case of small norms

11.4 On the complexity of testing properties of distributions

11.5 Chapter notes
11.5.1 History and credits
11.5.2 Exercise

**12. Ramifications and related topics **

12.1 Tolerant testing and distance approximation

12.2 Additional promises on the input

12.3 Sample based testers

12.4 Testing with respect to other distance measures

12.5 Local computation algorithms:
12.5.1 Definitions,
12.5.2 Finding huge structures,
12.5.3 Local reconstruction

12.6 Non-interactive proofs of proximity (MAPs)

12.7 Chapter notes:
12.7.1 Historical notes,
12.7.2 Massively parameterized properties,
12.7.3 Exercises

**13. Locally Testable Codes and Proofs **

13.1 Introduction

13.2 Definitions:
13.2.1 Codeword testers,
13.2.2 Proof testers,
13.2.3 Ramifications and relation to property testing,
13.2.4 On relating locally testable codes and proofs

13.3 Results and Ideas:
13.3.1 The mere existence of locally testable codes and proofs
(The Hadamard Code is locally testable and The Hadamard-Based PCP of ALMSS),
13.3.2 Locally testable codes and proofs of polynomial length
(Locally testable codes of almost quadratic length and
Locally testable proofs of polynomial length (The PCP Theorem)),
13.3.3 Locally testable codes and proofs of nearly linear length
(Local testability with nearly linear length
and The gap amplification method)

13.4 Chapter notes:
13.4.1 Historical notes,
13.4.2 On obtaining super-fast testers,
13.4.3 The alternative regime: LTCs of linear length,
13.4.4 Locally Decodable Codes,
13.4.5 Exercises

**Apdx A. Probabilistic Preliminaries **

A.1 Notational Conventions

A.2 Some basic notions and facts

A.3 Basic facts regarding expectation and variance

A.4 Three Inequalities:
A.4.1 Markov's Inequality,
A.4.2 Chebyshev's Inequality,
A.4.3 Chernoff Bound,
A.4.4 Pairwise independent versus totally independent sampling

**Apdx B. A Mini-Compendium of General Results **

**Apdx C. An Index of Specific Results **

**Bibliography **
**Index **

- A first lecture: themes, notions, definitions, and goals.
- Testing algebraic properties:
- Testing properties of functions over product domains:
- monotonicity
- properties of Boolean functions such as dictatorship, juntas, and monomials,
- and testing by implicit sampling;

- lower bound techniques
- Testing graph properties
- in the adjacency matrix (a.k.a dense graph) model,
- in the bounded-degree graph model,
- and in the general graph model;

- Testing properties of distributions
- Add'l topics: Tolerant testers, sample-based testers, distribution-free testers, local computation algorithms (i.e., finding huge structures and reconstruction), and MAPs.
- Locally testable codes and proofs (an overview).
- Appendix: some probabilistic preliminaries

Two specific texts that were converted (and extended) into the foregoing lecture notes are --

- Short Locally Testable Codes and Proofs - A Survey in Two Parts .
- Introduction to Testing Graph Properties

Back to homepage.