## A few talks at China Theory Week

As usual, the partial selection is based and restricted both by my attendence and my interests. In particular, I only attended two days of the week. I will comment shortly on the following six talks:

1. Chenggang Wu: Testing Surface Area
2. Huy L. Nguyen: Approximate Near Neighbor Search Beyond Locality Sensitive Hashing
3. Reut Levi: A Quasi-Polynomial Time Partition Oracle for Graphs with an Excluded Minor
4. Sangxia Huang: Improved Hardness of Approximating Chromatic Number
5. Avishay Tal: Properties and Applications of Boolean Function Composition
6. Zeyu Guo: Randomness-Efficient Curve Samplers

The China Theory Week is an annual event aimed at providing a platform for top CS theory students from institutions worldwide to meet and present their research. In 2007 and a few years following, the event was held in Tsinghua University (Beijing), but in the last couple of years it was held in Aarhus. See the official website.

#### 1. Chenggang Wu: Testing Surface Area

This work studies a natural double-approximation problem; as noted in the abstract, the standard (for volume appriximation) assumptions about the body (i.e., convexity) are replaced by a distance relaxation (i.e., the body is closed in a natural measure (e.g., Lebesque measure) to a body with a certain volume).

#### 2. Huy L. Nguyen: Approximate Near Neighbor Search Beyond Locality Sensitive Hashing

The point here is out-performing the guarantees obtained via the use of locality sensitive hashing (i.e., a hashing scheme that maps closed points to close values). The new scheme uses a sophisticated but natural combination of several hashing schemes, and analyzes the effect on close and distance pairs separately.

#### 3. Reut Levi: A Quasi-Polynomial Time Partition Oracle for Graphs with an Excluded Minor

The problem is of implementing an oracle that on input a vertex $v$ in a graph $G$ with excluded minors returns all vertices in a connected component of a subgraph of $G$ such that (1) the size of each connected component is small (i.e., smaller than $k$), and (2) the subgraph covers almost all edges of $G$ (i.e., it misses at most an $\eps$ fraction). The complexity measure is the number of (incidence) queries made to $G$ per each query to the oracle. While prior work had a query complexity exponential in $k/\eps$, the current work provides quasi-polynomial dependence, and leaves open the challlenge of obtaining a polynomial dependence. While prior work constructed iteratively certain contractions of the graph (and possibly "broken" some of them at the end), the current work interatively updates connected components of the graph by an alternate process of contraction and breaking-up. This alternation guarantees that the size of the connected components remains small throughout the process, which yields the (exponential) improvement in the query complexity.

#### 4. Sangxia Huang: Improved Hardness of Approximating Chromatic Number

Using Chan's recent optimal inapproximability result for CSPs, this work shows that it is NP-hard to color $K$-colorable graphs using $exp(K^{1/3)$ colors. This improves over a prior quasi-polynomial bound. The use of Chan's result requires to revisit the connection between CSPs and Chromatic Number for the specific instances that emerge in the reduction.

#### 5. Avishay Tal: Properties and Applications of Boolean Function Composition

This work present a unified and general frmaework for proving separations and collapses in decision tree complexity. The framework, which was implicit in some prior works, uses the behavior (e.g., multiplicativity) of various complexity measures under functional composition. The explicit presentation streamlines known results as well as allows for obtaining new ones.

#### 6. Zeyu Guo: Randomness-Efficient Curve Samplers

The goal is to construct good samplers such that the function describing the sample points, per each choice of random seed, is a low-degree (multi-variant) curve (i.e., sequence of polynomials). This guarantees that the value of any given (multi-variant) polynomial on the sample points can be described by a low degree polynomial. In fact, the work obtains good explicit samplers such that that are themselves curves (i.e., as a function of both the seed and the index of the sample point).

## The original abstracts

#### 1. Chenggang Wu: Testing Surface Area

We consider the problem of estimating the surface area of an unknown n-dimensional set F given membership oracle access. In contrast to previous work, we do not assume that F is convex, and in fact make no assumptions at all about F. By necessity this means that we work in the property testing model; we seek an algorithm which, given parameters $A$ and $\eps$, satisfies:
(1) [Completeness] If the surface area of F is at most $A$, then the algorithm accepts (whp);
(2) [Soundness] If F is not $\eps$-close to some set $G$ with surface area at most $\kappa A$, then the algorithm rejects (whp).
We call $\kappa \geq 1$ the *approximation factor* of the testing algorithm.
The $n = 1$ case (in which surface area is the number of endpoints) was introduced by Kearns and Ron, who solved the problem with $\kappa = 1/\eps$ and $O(1/\eps)$ oracle queries. Later, Balcan et al. [BBBY12] solved it with $\kappa =3D 1$ and $O(1/\eps^4)$ queries.
We give the first result for higher dimensions n. Perhaps surprisingly, our algorithm completely evades the "curse of dimensionality": for any n and any $\kappa > \frac{4}{\pi} \approx 1.27$, we give a test that uses $O(1/\eps)$ queries. For small $n$ we have improved bounds. For n=1 we can achieve $\kappa =1$ with $O(1/\eps^{3.5})$ queries (slightly improving [BBBY12]), or any $\kappa > 1$ with $O(1/\eps)$ queries (improving [KR98]). For n = 2, 3 we obtain $\kappa \approx 1.08, 1.125$ respectively, with $O(1/\eps)$ queries. Getting an arbitrary $\kappa > 1$ for n > 1 remains an open problem.
Finally, motivated by the learning results from [KOS08], we extend our techniques to obtain a similar tester for Gaussian surface area for any n, with query complexity $O(1/\eps)$ and any approximation factor $\kappa> \frac{4}{\pi} \approx 1.27$.
Joint work with Pravesh Kothari, Amir Nayyeri and Ryan O'Donnell.

#### 2. Huy L. Nguyen: Approximate Near Neighbor Search Beyond Locality Sensitive Hashing

The approximate near neighbor search problem is defined as follows: given a set P of n points in a d-dimensional space, build a data structure that, given a query point q, if there exists a point within distance r from q, then it reports a point within distance cr from q. Here c is the approximation factor of the algorithm.
We present a new data structure for the c=E2=80=93approximate near neighbor= problem (ANN) in the Euclidean space. For n points in R^d, our algorithm achieves O(dn^=CF=81) query time and O(n^{1+?=CF=81} + nd) space, where =CF=81 =E2= =89=A4 7/(8c^2) + O(1/c^3) + o(1). This is the first improvement over the result by Andoni and Indyk (FOCS 2006) and the first data structure that bypasses a locality=E2=80=93sensitive hashing lower bound proved by O=E2=80=99Donnell,= Wu and Zhou (ITCS 2011). By a standard reduction we obtain a data structure for the Hamming space and =E2=84=931 norm with =CF=81 =E2=89=A4 7/(8c) + O(1/c^{3/2= }) + o(1), which is the first improvement over the result of Indyk and Motwani (STOC 1998).
The talk is based on joint work with Alexandr Andoni, Piotr Indyk, and Ilya Razenshteyn.

#### 3. Reut Levi: A Quasi-Polynomial Time Partition Oracle for Graphs with an Excluded Minor

Motivated by the problem of testing planarity and related properties, we study the problem of designing efficient {\em partition oracles\/}. A {\em partition oracle\/} is a procedure that, given access to the incidence lists representation of a bounded-degree graph $G=3D (V,E)$ and a parameter $\eps$, when queried on a vertex $v\in V$, returns the part (subset of vertices) which $v$ belongs to in a partition of all graph vertices. The partition should be such that all parts are small, each part is connected, and if the graph has certain properties, the total number of edges between parts is at most $\eps |V|$. In this work we give a partition oracle for graphs with excluded minors whose query complexity is quasi-polynomial in $1/\eps$, thus improving on the result of Hassidim et al. ({\em Proceedings of FOCS 2009}) who gave a partition oracle with query complexity exponential in $1/\eps$. This improvement implies corresponding improvements in the complexity of testing planarity and other properties that are characterized by excluded minors as well as sublinear-time approximation algorithms that work under the promise that the graph has an excluded minor.

#### 4. Sangxia Huang: Improved Hardness of Approximating Chromatic Number

A vertex coloring of a graph is an assignment of colors to its vertices such that no two adjacent vertices receive the same color. The minimum number of colors needed for such a coloring is called the chromatic number of G. We mainly consider the approximation of the chromatic number, that is given a K-colorable graph, we would like to color it with as few colors as possible in polynomial time. As a classical combinatorial optimization problem, vertex coloring is closely related to other problems such as finding maximum independent sets, Max CSPs, and probabilistically checkable proofs (PCPs) with certain special properties. In addition to being an important theoretical challenge, graph coloring also has a number of applications such as scheduling and register allocation. Building on recent breakthrough in the construction of efficient PCPs, we show that it is NP-hard to distinguish a K-colorable graph from a graph that has chromatic number at least 2^Omega(K^{1/3}) for sufficiently large K. This improves the previous best known lower-bound of 2^{Omega((log K)^2)}.

#### 5. Avishay Tal: Properties and Applications of Boolean Function Composition

This talk focuses on a simple but useful tool for separating complexity measures: Boolean function composition. For Boolean functions f, g defined on n, m input variables respectively, the composition of f and g is the value of f on n inputs, each of them is the calculation of g on a distinct set of m Boolean variables. Previous works have used this tool to achieve some of the best separations between complexity measures of Boolean functions such as sensitivity, block sensitivity, certificate complexity, degree, decision tree complexity and quantum query complexity. This was due to the multiplicative nature of these measures with respect to composition.
We use this multiplicative behavior to establish several new applications. First, we give a counter-example using composition for a conjecture made by Adam Kalai in 2004, which if true would imply a better algorithm for the learning junta problem. Second, we show an unusual behaviour of block sensitivity with respect to composition which was not noticed before. We define a new complexity measure called "fractional block sensitivity" which better explains this behaviour. Last, we show how composition can tighten relations between complexity measures, by improving a result by Nisan and Szegedy about the relation between block sensitivity and degree as a polynomial.

#### 6. Zeyu Guo: Randomness-Efficient Curve Samplers

Curve samplers are sampling algorithms that proceed by viewing the domain as a vector space over a finite field, and randomly picking a low-degree curve in it as the sample. Curve samplers exhibit a nice property besides the sampling property: the restriction of low-degree polynomials over the domain to the sampled curve is still low-degree. This property is often used in combination with the sampling property and has found many applications, including PCP constructions, local decoding of codes, and algebraic PRG constructions. The randomness complexity of curve samplers is a crucial parameter for its applications. It is known that (non-explicit) curve samplers using $O(\log N+\log(1/\delta))$ random bits exist, where $N$ is the domain size and $\delta$ is the confidence error. The question of explicitly constructing randomness-efficient curve samplers was first raised in Ta-Shma and Umans (FOCS 2006) where they obtained curve samplers with near-optimal randomness complexity. We present an explicit construction of low-degree curve samplers with optimal randomness complexity (up to a constant factor), sampling curves of degree $\lb m\log_q(1/\delta)\rb^{O(1)}$ in $\mathbb{F}_q^m$. Our construction is a delicate combination of several components, including extractor machinery, limited independence, iterated sampling, and list-recoverable codes.

Note: See ECCC TR13-120.

Back to list of Oded's choices.