Property Testing is concerned with approximate decisions, where the
task is distinguishing between objects having a predetermined property
and objects that are ``far'' from having this property.
Typically, objects are modeled by functions,
and distance between functions is measured as the fraction
of the domain on which the functions differ.
We consider (randomized) algorithms that may query the function
at arguments of their choice,
and seek algorithms of very low complexity
(i.e., much lower than the size of the function's domain).
ISBN 9781107194052
Publisher:
Cambridge University Press |
See also a collection of surveys and extended abstracts on property testing (edited in 2010).
Material available on line:
Open Problems resolved
In other words, property testing is concerned with the design of super-fast algorithms for approximate decision making, where the decision refers to properties or parameters of huge objects. In particular, we seek algorithms that only inspect relatively small portions of the huge object. Such algorithms must be randomized and can only provide approximate answers. Indeed, two salient aspects of property testing are that (1) it studies algorithms that can only read parts of the input, and (2) it focuses on algorithms that solve ``approximate decision'' problems. Both aspects are quite puzzling: What can one do without even reading the input? What does approximate decision mean?
The answer is that these two aspects are indeed linked: Approximate decision means distinguishing objects that have some predetermined property (i.e., reside in some predetermined set) from objects that are ``far'' from having the property (i.e., are far from any object having the property), where the notion of distance employed here is the relative number of different symbols in the descriptions of the objects. Such approximate decisions may be valuable in settings in which an exact decision is infeasible or very expensive or just considerably more expensive than obtaining an approximate decision.
The point is that, in many cases, approximate decision can be achieved by super-fast randomized algorithms. One well-known example is the common practice of estimating various statistics by sampling, which can be cast as a a small collection of approximate decision problems (with respect to some threshold values). Research in property testing aims to extend this useful practice to properties that cannot be cast as statistics of values that are associated with a large population. Examples in which this goal was achieved include testing properties of functions such as being a low degree polynomial, being monotone, depending on a specified number of attributes, testing properties of graphs such as being bipartite and being triangle-free, and testing properties of geometric objects and visual images such as being well-clustered and being a convex body.
Viewing the input as a function is natural in the context of algorithms that do not read their entire input. Such algorithms must probe the input at locations of their choice, and such probing can be thought of as querying a function that represents the input. The key point here is that the number of probes (or queries) is smaller than the size of the input, and so decisions are taken after seeing only a small part of the input. However, the inspected positions are not fixed but are rather chosen at random by the algorithm, possibly based on answers obtained to prior queries. Thus, in general, these algorithms may ``explore'' the input, rather than merely obtain its value at a uniformly selected sample of locations. Such exploration is most appealing when the tested input is a graph, which may be represented by a function (e.g., by its adjacency predicate), but the notion of exploration applies also in other cases.
Research in property testing may be both algorithmic and complexity theoretic. This is reflected both in its goals, which may be the design of better algorithms or the presentation of lower bound on their complexity, and its tools and techniques. Such research is related to several areas of computer science and mathematics including Combinatorics, Statistics, Computational Learning Theory, Computational Geometry, and Coding theory. Historically, property testing was closely associated with the study of Probabilistically Checkable Proofs (PCPs), and some connections do exist between the two, but property testing is not confined to PCPs (and/or to the study of ``locally testable codes'').
This book. The current book aims to provide an introduction to Property Testing, by presenting some of the main themes, results, and techniques that characterize the area and are used in it. As usual in such cases, the choice reflects a judgement of what is most adequate for presentation in the context of such an introductory course, and this selection does not reflect lack of appreciation of the omitted material but rather an opinion that it fit less well the intended purpose of the text.
In addition to the choice of material for this book, we made several choices regarding the organization of the material and the amount of inter-dependencies among its parts.
We chose to organize the material by the type of the objects and properties being tested. By the ``type of object'' we refer to the natural perception of the object; for example, whether it is most naturally perceived as a function or as a graph. Within the world of functions, the types correspond to the structure of the domain on which the function is defined (e.g., a group, a vector space, a Boolean hypercube, or a hyper-grid). The structure of the domain is often reflected by the invariances that are satisfied by the properties that we consider (e.g., affine invariance, closure under graph isomorphism, etc). Hence, our organization may be termed structure-oriented. (Possible alternatives to our organization include an organization by techniques (as in [Ron] or by complexity levels (e.g., whether the complexity is independent on the size of the object, is mildly dependent on it, is barely sub-linear, or somewhere in between).)
We chose to present the material while making as few pointers between chapters as possible. Of course, all chapters depend on the core notions that are introduced in the first chapter, but additional inter-dependencies are rare and never heavily relied upon. Hence, the ordering of the other chapters is not very important, although we preferred a specific one.
We chose to use (one-sided error) proximity-oblivious testers (POTs) whenever possible. This reflects our belief that when a tester (implicitly or explicitly) consists of repeating a POTs for a number of times that depends on the proximity parameter, one should focus on the POT itself and rely on the generic transformation from POTs to standard testers.
For sake of uniformity, $n$ always denotes the size of the object in a natural representation (which is not grossly redudent). Hence, objects are typically viewed as functions $f:[n]\to R_n$. Consequently, Boolean functions are presented as $f:\bitset^\ell\to\{0,1\}$, where $n=2^\ell$ (rather than as having domain $\{0,1\}^n$).
Although the ordering of the chapters that follow Chapter 1 is not very important, a choice had to be made. Our choice was to start with simple properties of functions such as group homomorphism, low degree polynomials, monotonicity (with respect to various partial orders), and depending on few variables (i.e., juntas). In all these cases, the correspondence between the object and its representation is transparent: the function is the object. In contrast, when moving to graph properties, the question of representation arises in an acute manner, and three different chapters are devoted to three different representations that correspond to three different testing models. Hence, from the perspective of property testing per se, it seems to make sense to start with functions and then move to graphs.
In accordance with the above, the first cluster of chapters (Chapters 2-6) deals with testing properties of functions, whereas a later cluster (Chapters 8-10) deals with testing properties of graphs. A chapter on lower bound techniques (i.e., Chapter 7) is located in between these two clusters, since lower bounds are hardly mentioned in the first cluster, whereas they appear quite prominently in Chapters 9-10. The reason for this phenomena is that these lower bounds are used in order to justify the significantly higher complexity of some testers that are presented in Chapters 9-10. Indeed, in the context of property testing, we view lower bounds merely as justification for algorithms that may be considered to have a higher than expected complexity; the lower bounds assert that this impression is actually false, and that one cannot do significantly better.
Chapters 11-13 form a third cluster of material that is very related to the previous chapters and yet has a different flavor. Chapter 11 deals with testing properties of distributions, Chapter 12 explores a few related topics (some of which were mentioned in Section 1.3.2), and Chapter 13 reviews locally testable codes and proofs. We stress that in Chapter 11 the tested objects are fundamentally different from those considered in all the other chapters, whereas in Chapter 13 we consider objects that are artificially designed so to offer super-fast testing.
Chapter 1: The Main Themes (Approximate Decision and Sub-linear Complexity). This chapter introduces and illustrates the basic notions of property testing, emphasizing the themes of approximate decision and sub-linear complexity. The chapter starts with a discussion of the potential benefits of testing, and presents the definitions of (standard) testers and of proximity-oblivious testers (POTs). It discusses the key role of representation, points out the focus on properties that are not fully symmetric, and presents several some general observations regarding POTs, testing, and learning.
Chapter 2: Testing Linearity (Group Homomorphism). This chapter present an analysis of a linearity tester that, on input a description of two groups $G,H$ and oracle access to a function $f:G\to H$, queries the function at three points and satisfies the following conditions:
Chapter 3: Low Degree Tests. For a finite field of prime cardinality $\F$, a degree bound $d<|\F|/2$ and number $m\in\N$, we consider the problem of testing whether a function $f:\F^m\to\F$ is a polynomial of total degree at most $d$. We present and analyze a low-degree tester that, given oracle access to $f:\F^m\to\F$, queries it at $d+2$ points and satisfies the following conditions:
Chapter 4: Testing Monotonicity. For each $n$, we consider functions from a partially ordered set $D_n$ to a totally ordered set $R_n$. Such a function $f:D_n\to R_n$ is called monotone if for every $x\leq y$ in $D_n$ it holds that $f(x)\leq f(y)$, where $\leq$ denotes the partial order of $D_n$ and the total order in $R_n$. Two special cases of interest are:
Chapter 5: Testing Dictatorships, Juntas, and Monomials. We consider testing three basic properties of Boolean functions of the form $f:\bitset^\ell\to\bitset$:
Chapter 6: Testing by Implicit Sampling. Building on the junta tester, we present a general methodology for constructing testers for properties of Boolean functions (of the form $f:\bitset^\ell\to\bitset$) that can be approximated by small juntas. This methodology yields testers of low query complexity for many natural properties, which contain functions that depend on relatively few relevant variables; specifically, the query complexity is related to the size of the junta and is independent of the length of the input to the function (i.e., $\ell$).
Chapter 7: Lower Bounds Techniques. We present and illustrate three techniques for proving lower bounds on the query complexity of property testers.
Chapter 8: Testing Graph Properties in the Dense Graph Model. Following a general introduction to testing graph properties, this chapter focuses on the dense graph model, where graphs are represented by their adjacency matrix (predicate). The highlights of this chapter include:
Chapter 9: Testing Graph Properties in the Bounded-Degree Graph Model. This chapter is devoted to testing graph properties in the bounded-degree graph model, where graphs are represented by their incidence lists (lumped together in an incidence function). The highlights of this chapter include:
Chapter 10: Testing Graph Properties in the General Graph Model. This chapter is devoted to testing graph properties in the general graph model, where graphs are inspected via incidence and adjacency queries, and distances between graphs are normalized by their actual size (i.e., actual number of edges). The highlights of this chapter include:
Chapter 11: Testing Properties of Distributions. This chapter provides an introduction to the study of testing properties of distributions, where the tester obtains samples of an unknown distribution (resp., samples from several unknown distributions) and is required to determine whether the distribution (resp., the tuple of distributions) has some predetermined property. We focus on the problems of testing whether an unknown distribution equals a fixed distribution and of testing equality between two unknown distributions. Our presentation is based on reductions from the general cases to some seemingly easier special cases. In addition, we also provide a brief survey of general results.
Chapter 12: Ramifications and related topics. We briefly review a few ramifications of the notion of property testers as well as related topics. The list includes tolerant testing and distance approximation; testing in presence of additional promises on the input; sample-based testers; other distance measures; local computation algorithms; and non-interactive proofs of proximity (MAPs). The different sections of this chapter can be read independently of one another.
Chapter 13: Locally Testable Codes and Proofs. We survey known results regarding locally testable codes and locally testable proofs (known as PCPs). Local testability refers to approximately testing large objects based on a very small number of probes, each retrieving a single bit in the representation of the object. This yields super-fast approximate-testing of the corresponding property (i.e., be a codeword or a valid proof). In terms of property testing, locally testable codes are error correcting codes such that the property of being a codeword can be tested within low query complexity. As for locally testable proofs (PCPs), these can be viewed as massively parametrized properties that are testable within low query complexity such that the parameterized property is non-empty if and only if the corresponding parameter is in a predetermined set (of ``valid statements''). Our first priority is minimizing the number of probes, and we focus on the case that this number is a constant. In this case (of a constant number of probes), we aim at minimizing the length of the constructs. That is, we seek locally testable codes and proofs of short length.
Appendix A: Probabilistic Preliminaries. This appendix presents background from probability theory, which is used extensively throughout the book. This background and preliminaries include conventions regarding random variables, basic notions and facts, and three useful probabilistic inequalities (i.e., Markov's Inequality, Chebyshev's Inequality, and Chernoff Bound).
Appendix B: A Mini-Compendium of General Results. This appendix restates several general results that were presented in prior chapters, including deriving standard testers from POTs; positive results on the algebra of property testing; reducing testing to learning; the randomness complexity of testers; archetypical application of self-correction; and the effect of local reductions.
Appendix C: An Index of Specific Results. This appendix provides an index to all results regarding specific properties that were presented in this book.
1. The Main Themes: Approximate Decision and Sub-linear Complexity
1.1 Introduction:
1.1.1 Property testing at a glance,
1.1.2 On the potential benefits of property testers,
1.1.3 On the flavor of property testing research,
1.1.4 Organization and some notations
1.2 Approximate decisions:
1.2.1 A detour: approximate search problems,
1.2.2 Property testing: approximate decision problems,
1.2.3 Property testing: sub-linear complexity,
1.2.4 Symmetries and invariants,
1.2.5 Objects and representation
1.3 Notions, definitions, goals, and basic observations:
1.3.1 Basics,
1.3.2 Ramifications,
1.3.3 Proximity-oblivious testers (POTs),
1.3.4 The algebra of property testing,
1.3.5 Testing via learning
1.4 Historical notes
1.5 Suggested reading and exercises
1.6 Digest: The most important points
2. Testing Linearity (Group Homomorphism)
2.1 Preliminaries
2.2 The tester
2.3 Chapter notes
3. Low Degree Tests
3.1 A brief introduction
3.2 A kind of intuition (which may be skipped)
3.3 Background
3.4 The tester
3.5 Chapter notes
4. Testing Monotonicity
4.1 Introduction
4.2 Boolean functions on the Boolean Hypercube:
4.2.1 The edge test,
4.2.2 Path tests
4.3 Multi-valued functions on the discrete line:
4.3.1 A tester based on binary search,
4.3.2 Other testers
4.4 Multi-valued functions on the Hypergrid:
4.4.1 Dimension reduction,
4.4.2 Range reduction
4.5 Chapter notes
5. Testing Dictatorships, Juntas, and Monomials
5.1 Introduction
5.2 Testing dictatorship via self-correction:
5.2.1 The tester,
5.2.2 Testing monomials,
5.2.3 The self-correction paradigm: an abstraction
5.3 Testing juntas
5.4 Chapter notes
6. Testing by Implicit Sampling
6.1 Introduction
6.2 Testing subsets of $k$-juntas
6.3 Extension to properties approximated by subsets of $k$-juntas
6.4 Chapter notes
7. Lower Bounds Techniques
7.1 Introduction
7.2 Indistinguishability of distributions:
7.2.1 The actual method,
7.2.2 Illustrating the application of the method,
7.2.3 Further reflections
7.3 Reduction from Communication Complexity:
7.3.1 Communication Complexity,
7.3.2 The methodology,
7.3.3 Illustrating the application of the methodology
7.4 Reduction among testing problems
7.5 Lower bounds for restricted testers:
7.5.1 One-sided error testers,
7.5.2 Non-adaptive testers
7.6 Chapter notes
8. Testing Graph Properties in the Dense Graph Model
8.1 The general context: Introduction to testing graph properties:
8.1.1 Basic background,
8.1.2 Three Models of Testing Graph Properties
8.2 The Dense Graph Model: Some basics:
8.2.1 The actual definition,
8.2.2 Abuses of the model: Trivial and sparse properties,
8.2.3 Testing degree regularity,
8.2.4 Digest: Levin's economical work investment strategy
8.3 Graph Partition Problems:
8.3.1 Testing Bipartiteness,
8.3.2 The actual definition and the general result
8.4 Connection to Szemeredi's Regularity Lemma:
8.4.1 The Regularity Lemma,
8.4.2 Subgraph freeness,
8.4.3 The structure of properties that have size-oblivious testers
8.5 A Taxonomy of the known results
8.6 Chapter notes:
8.6.1 Historical perspective and credits,
8.6.2 Testing versus other forms of approximation,
8.6.3 A contrast with recognizing graph properties,
8.6.4 Exercises
9. Testing Graph Properties in the Bounded-Degree Graph Model
9.1 The Bounded-Degree Model: Definitions and issues
9.2 Testing by a local search:
9.2.1 Testing subgraph freeness,
9.2.2 Testing degree regularity,
9.2.3 Testing connectivity,
9.2.4 Testing $t$-connectivity (overview and one detail),
9.2.5 Testing cycle-freeness (with two-sided error)
9.3 Lower bounds:
9.3.1 Bipartitness,
9.3.2 Applications to other properties,
9.3.3 Linear lower bounds
9.4 Testing by random walks:
9.4.1 Testing Bipartiteness,
9.4.2 One-sided error tester for Cycle-freeness
9.5 Testing by implementing and utilizing partition oracles:
9.5.1 The simpler implementation,
9.5.2 The better implementation
9.6 A Taxonomy of the known results
9.7 Chapter notes:
9.7.1 Historical perspective and credits
9.7.2 Directed graphs
9.7.3 Exercises
10. Testing Graph Properties in the General Graph Model
10.1 The General Graph Model: Definitions and issues
10.2 On obtaining testers for the current model:
10.2.1 An explicit adaptation: the case of connectivity,
10.2.2 Using a reduction: the case of Bipartiteness
10.3 Estimating the average degree and selecting random edges:
10.3.1 Lower bounds,
10.3.2 Algorithms (Bucketing vertices according to their degree
and Sorting vertices according to their degree)
10.4 Using adjacency queries: the case of Bipartiteness
10.5 Chapter notes:
10.5.1 Gaps between the general graph model and the bounded-degree model,
10.5.2 History and credits,
10.5.3 Reflections,
10.5.4 Exercises
11. Testing Properties of Distributions
11.1 The model:
11.1.1 Testing properties of single distributions,
11.1.2 Testing properties of pairs of distributions,
11.1.3 Label-invariant properties,
11.1.4 Organization
11.2 Testing equality to a fixed distribution:
11.2.1 The collision probability tester and its analysis,
11.2.2 The general case (treated by a reduction to testing uniformity),
11.2.3 A lower bound
11.3 Testing equality between two unknown distributions:
11.3.1 Detour: Poisson Distributions,
11.3.2 The actual algorithm and its analysis,
11.3.3 Applications: Reduction to the case of small norms
11.4 On the complexity of testing properties of distributions
11.5 Chapter notes
11.5.1 History and credits
11.5.2 Exercise
12. Ramifications and related topics
12.1 Tolerant testing and distance approximation
12.2 Additional promises on the input
12.3 Sample based testers
12.4 Testing with respect to other distance measures
12.5 Local computation algorithms:
12.5.1 Definitions,
12.5.2 Finding huge structures,
12.5.3 Local reconstruction
12.6 Non-interactive proofs of proximity (MAPs)
12.7 Chapter notes:
12.7.1 Historical notes,
12.7.2 Massively parameterized properties,
12.7.3 Exercises
13. Locally Testable Codes and Proofs
13.1 Introduction
13.2 Definitions:
13.2.1 Codeword testers,
13.2.2 Proof testers,
13.2.3 Ramifications and relation to property testing,
13.2.4 On relating locally testable codes and proofs
13.3 Results and Ideas:
13.3.1 The mere existence of locally testable codes and proofs
(The Hadamard Code is locally testable and The Hadamard-Based PCP of ALMSS),
13.3.2 Locally testable codes and proofs of polynomial length
(Locally testable codes of almost quadratic length and
Locally testable proofs of polynomial length (The PCP Theorem)),
13.3.3 Locally testable codes and proofs of nearly linear length
(Local testability with nearly linear length
and The gap amplification method)
13.4 Chapter notes:
13.4.1 Historical notes,
13.4.2 On obtaining super-fast testers,
13.4.3 The alternative regime: LTCs of linear length,
13.4.4 Locally Decodable Codes,
13.4.5 Exercises
Apdx A. Probabilistic Preliminaries
A.1 Notational Conventions
A.2 Some basic notions and facts
A.3 Basic facts regarding expectation and variance
A.4 Three Inequalities:
A.4.1 Markov's Inequality,
A.4.2 Chebyshev's Inequality,
A.4.3 Chernoff Bound,
A.4.4 Pairwise independent versus totally independent sampling
Apdx B. A Mini-Compendium of General Results
Apdx C. An Index of Specific Results
Bibliography
Index
Plan for a 24-hour course based on the book
Two specific texts that were converted (and extended) into the foregoing lecture notes are --