# Program of the ITCS mini-workshop on Property Testing (2010)

Organizers: Oded Goldreich and Amy Yuexuan Wang.
Dates: January 8-10, 2010.
Official webpage: click here.

Property Testing is the study of super-fast (randomized) algorithms for approximate decision making. These algorithms are given direct access to items of a huge data set, and determine whether this data set has some predetermined (global) property or is far from having this property. Remarkably, this approximate decision is made by accessing a small portion of the data set. Property Testing has been a subject of intensive research in the last couple of decades, with hundreds of studies conducted in it and in closely related areas. Indeed, Property Testing is closely related to Probabilistically Checkable Proofs (PCPs), and is related to Coding Theory, Combinatorics, Statistics, Computational Learning Theory, Computational Geometry, and more.

## Program (annotated with brief comments by Oded Goldreich)

Abstracts of all talks are available in the official brochure of the mini-workshop.
(In addition, extended abstracts of most papers are available below.)

#### SESSION 1 (FRIDAY, Jan 8, 9:00-10:15) [chair: Oded Goldreich]

Kevin was ill, and so Rocco gave both talks. I am fascinated by the implicit learning paradigm, which is pivoted on emulating learning algorithms for $k$-variable functions by using $n$-bit long samples with $k$ influential variables. The emulation proceeds by randomly partitioning the variables to $O(k^2)$ sets, identifying $k$ sets that are likely to each contain a single influential variable, and converting the $n$-bit long samples to corresponding $k$-bit samples. Specifically, each $n$-bit sample is convereted by determining for each influential set whether the subset of variables labeled 1 or the subset labeled 0 is influential, and setting the corresponding bit accordingly. Once the learning algorithm outputs a hypothesis for the $k$-bit function, it is tested using adequate queries (which are again emulated correspondingly).

#### SESSION 2 (FRIDAY, Jan 8, 10:45-12:00) [chair: Avrim Blum]

See a review of Eric's work on testing juntas in my choices (Nr. 19). In addition, Eric advocated a study of the following class of parameterized problems. For a fixed function $g$, which is known to the tester, the task is testing whether the given oracle/function is isomorphic to $g$ (i.e., $f$ equals $g$ under a relabeling of the indices of the variables). Michael presented an overview of results that assert the existence of properties with arbitrary (reasonable) query complexity bound.

#### SESSION 3 (FRIDAY, Jan 8, 14:00-15:30) [chair: Bernard Chazelle]

Ilan surveyed works that refer to "massively parameterized" testing problems. The parameters are huge structures (e.g., graphs) and the problem refers to a substructure of it (e.g., an assignment to the graph's edges). In the case of graphs, the assignment may be viewed as an orientation of the fixed graph or a subgraph of it. Sofya discussed the notion of transitive-closure spanners (i.e., spanners of the transitive closure graph) that are implicit in some monotonicity testers.

#### PANEL 1 (FRIDAY, Jan 8, 16:00-17:00)

On the connection of Property Testing to Computational Learning Theory and Computational Geometry.
Panelists: Avrim Blum, Bernard Chazelle, and Rocco Servedio.
Bernard advocated an attempt to provide a sublinear time analysis of dynamic systems, which may consist of selecting few objects and tracing their activity during time. Avrim emphasised the high cost of queries (rather than random examples) in typical learning applications, and I suggested to consider a two-parameter complexity measure that separates the number of random examples from the number of actual queries. Rocco suggested to try to relate property testing to agnostic learning (rather than to standard learning), and highlighted the open problem of tolerant testing of half planes.

#### SESSION 4 (SATURDAY, Jan 9, 9:00-10:15) [chair: Shafi Goldwasser]

Madhu mused on what makes problems easily testable and suggested that closure under a rich permutation group plays a major role. This seems supperted by the numerous results regarding algebraic functions (which are all invariant under suitable affine transformations) and graph properties (which, by definition, are closed under isomorphism). He surveyed a unified approach that refers to the former, where the affine transformations refer to the identification of the domain with a vector space. Victor demonstrated that the approach extends also to non-linear properties, and Noga commented that an alternative demonstration may be provided by a property that is a union of two disjoint linear spaces.

#### SESSION 5 (SATURDAY, Jan 9, 10:45-12:00) [chair: Michael Saks]

Asaf's study refers to the paramertized problem of whether a given subset of $[n]$ is free from containing any solution to a fixed set of linear equations. This was studied before with respect to a single equation (i.e., $x+y=z$), and Asaf's work treats any linear system. Noga's study refers to approximating quantities such as the independence number of bounded-degree graphs. He shows that constant-time algorithms can almost match the best approximation bounds that are previously known for PTAS, and that better bounds cannot be achieved in constant-time. (This is related to Krzysztof's presentation in Session 6, which follows prior work of Dana Ron et al.) Ronitt's work demonstrates that techniques employed in the context of distributed algorithms can be applied to the context of dynamic graph algorithms. This augments prior connections between constant-round distributed algorithms and constant-time approximation algorithms discovered by Parnas and Ron. Ronitt asked whether a direct relation can be shown between dynamic graph algorithms and constant-time approximation algorithms. Krzysztof advocated applying external memory cost measures to property testing algorithms. He showed such a result for the problem of element distinctness, presenting an algorithm that samples $\sqrt(nB/\eps)$ random $B$-bit blocks.

#### SESSION 6 (SATURDAY, Jan 9, 14:00-15:30) [chair: Noga Alon]

• Krzysztof Onak: Sublinear Graph Approximation Algorithms
• Michael Saks: Local Monotonicity Reconstruction
• Ronitt Rubinfeld: Testing properties of distributions: a survey
See a review of Krzysztof's work in my choices (Nr. 8). Michael discussed the local reconstruction problem applied to objects for which there are exponentially many adequate solutions. The problem in this case is to determine some fixed solution as a function of the corrupted oracle and the random bits (used in the reconstruction), where in some applications it is desireable to use few random bits. Ronitt surveyed the study of testing properties of distributions, starting with the problem of testing identity to some known distribution (i.e., a parametreized problem), which is solvable by $sqrt(n)$ samples when $n$ is the domain's size (and the estimate is via collision probabilities). Testing that two given distributions (i.e., both distributions are "given" by samples) requires 3-way collisions and so $n^{2/3}$ samples. In contrast, approximating the distance to the uniform distribution requires an "almost linear" number of samples.

#### PANEL 2 (SATERDAY, Jan 9, 16:00-17:00)

On the connection of Property Testing to Coding Theory, Combinatorics, and Statistics.
Panelists: Madhu Sudan, Noga Alon, and Ronitt Rubinfeld.
Noga focused on the "global vs local" nature of property testing that is closely related to a main theme in combinatorics initiated by Erdos in the 1950's. He noted that a tester of $k$-colorability is implicit in work of the 1980's, but the query bound obtained there is a tower of exponents in the proximity parameter. Answering a question, Noga kind of suggested the challenge of testing triangle-freeness in $2^{O(1/\eps)^2}$ queries. (Recall that the currently best known bound is a tower of exponents in $poly(1/\eps)$.) Ronitt reported on the awakening of interest of statisticians in the effect of the size of the domain (of values). (Traditionally, Statistics is concerned with the effect of the size of the universe (sample space), whereas we are concerned with the effect of the number of values obtained in this space.) Still there are significant cultural differences, since in Statistics one often makes (implicit) assumptions about the distribution and/or assumes that the distribution itself is selected at random among some possibilities (hence "likelihood" is relevant). Ronitt also pointed out possible applications to areas that use Statistics such as natural language processing and data bases. Madhu pointed out that sublinear time algorithms arise naturally in the context of coding theory. Shafi advocated the study of property testing of algebraic (and number theoretic) problems beyond linearity and low-degree properties. A concrete example may be testing that a polynomial (given by its evaluations) is an irreducible polynomial. Can a tester be significantly more efficient than polynomial interpolation?

#### SESSION 7 (SUNDAY, Jan 10, 9:00-10:15) [chair: Ronitt Rubinfeld]

See a review of Prahladh's work in my choices (Nr. 13). Victor surveyed the state of the art regarding the amortized query complexity of dictatorship testing (with perfect completeness), which currently stands on soundness error of $O(q\cdot2^{-q})$ per $q$ non-adaptive queries (see ECCC TR09-074). Michael discussed a general framework for testing graph properties, where distances are normalized by the actual number of edges and various types of queries are considered. In the work, "group queries" (akin of "group testing") are shown to be strictly stronger than the combination of the standard vertex-pair and neighbor queries.

#### SESSION 8 (SUNDAY, Jan 10, 10:45-12:00) [chair: Michael Krivelevich]

Shubhangi advocated viewing locally testable linear codes as a special case of tolerant testing the Hadamard codewords under some adequate distributions (specifically, the distribution that is uniform on the corresponding positions of the Hadamard code). The tolerance condition prevents rejecting a codeword based on positions that are assigned zero probability. She presented two results, relating to two notion of tolerant testing (or rather distance approximaton): the more general result yields a weak notion of approximation and is obtained under the so-called uniformly correlated condition, whereas a strong notion of approximation requires a stronger correlated condition. A distribution $\mu$ is called $k$-uniformly correlated if there is a joint distribution of $k$-tuples such that each element (in the $k$-tuple) is distributed as $\mu$ but their sum is uniformly distributed. The stronger condition requires that this holds when these $k$ elements are independently drawn from $\mu$, which is equivalent to requiring that the code be sparse and have small Fourier coefficients. Swastik presented an improved analysis of the AKKLR linearity test (for degree $d$ multilinear polynomails over GF(2)), showing that the basic test rejects far away function with constant probability (rather than with probability $Omega(2^{-d})$).

#### SESSION 9 (SUNDAY, Jan 10, 14:00-15:30) [chair: Madhu Sudan]

Eli surveyed several results that assert that various parameters of LTCs imply an exponentially vabishing rate. One such result refers to LTCs that are tested by a small set of possible constraints (which is only somewhat larger than the dimention of the dual code). Arthur advocated the study of testing continuous distributions, focusing on the case that they are actually discrete (i.e., assume a finite number of possible values). Krzysztof contrasted the query complexity of testing the edit distance of a given string (from a fixed string) to the corresponding problem for Ulam distance. Sofya presented an application of reconstruction procedures (as presented by Michael Saks in Session~2) to data privacy; specifically, if data privacy is maintained for functions of a certain type, then we may want to modify the given function to that class. I tried to call attention to a work with Dana Ron, highlighting several perspectives and questions that arise.

#### PANEL 3 (SUNDAY, Jan 10, 16:00-17:00)

• Prahladh Harsha on decoding in the low-error regime
• Ron Rivest on possible applications of property testing to the security evaluation of hashing functions.
Prahladh mused on whether decoding in the low-error regime may find additional applications in property testing. Ron asked whether property testing techniques can be emplyed to the evaluation of the quality of various compression functions.