Properly learning monotone functions via local correction

by Jane Lange, Ronitt Rubinfeld and Arsen Vasilyan

Oded's comments

What I find extremely interesting is that proper learning is (relatively efficiently) reduced to improper learning (where in both case we refer to learning under the uniform distribution). The reduction uses a local correction procedure, which in turn relies on a local computation algorithm.

It is instructive to emphasize that the objects being treated here are (Boolean) functions over $\{0,1\}^n$; that is, these are objects of description length $N=2^n$. Hence, a sublinear-time local computation algorithm (LCA) is relevant provided that its running time in terms of its input-length $N$ is affordable when stated in terms of $n$; in this case, an $\exp(\tildeO(\sqrt(\log N)))$-time LCA means on overhread of $\exp(\tildeO(\sqrt n))$, which is affordable.

Specifically, focusing on the $O(\sqrt n)$ middle layers of the $n$-dim Boolean hypercube, they first find a maximal matching in the violation graph, where $n$-bit strings are connected by an edge if their values, under the hypothesis (found by improper learning), violate monotonicity. Then, the value of each vertex that participates in the matching is reset so to avoid a violation with any value of a non-matched vertex that resides above it (in the proned hypercube (i.e., of the middle layers)). This uses two facts about the violation graph.

  1. The degree of vertices in this graph is $n \choose {O(\sqrt n)}$, whereas a LCA for maximal matching operates in time polynomial in the degree and polylogarithmic in the size of the graph.
  2. The size of a maximal matching in this graph is a 2-approximation of the distance of the hypothesis from the set of monotone functions.

The original abstract

We give a 2^{tlide{O}(\sqrt{n}/\epsilon)} -time algorithm for properly learning monotone Boolean functions under the uniform distribution over {0,1}^n. Our algorithm is robust to adversarial label noise and has a running time nearly matching that of the state-of-the-art improper learning algorithm of Bshouty and Tamon (JACM 96) and an information-theoretic lower bound of Blais et al (RANDOM ’15). Prior to this work, no proper learning algorithm with running time smaller than 2^{\Omega{n}} was known to exist. The core of our proper learner is a local computation algorithm for sorting binary labels on a poset. Our algorithm is built on a body of work on distributed graph algorithms; specifically we rely on a recent work of Ghaffari (FOCS’22), which gives an efficient algorithm for computing maximal matchings in a graph in the local computation algorithms (LCA) model. The applications of our local sorting algorithm extend beyond learning on the Boolean cube: we also give a tolerant tester for Boolean functions over general posets that distinguishes functions that are \epsilon/3-close to monotone from those that are ?far. Previous tolerant testers for the Boolean cube only distinguished between \epsilon/\Omega(\sqrt{n})-close and \epsilon?far. We will briefly describe recent work of Lange and Vasilyan (FOCS 2023) which extends our result to the agnostic learning framework.

See FOCS 2022 proceedings.


Back to list of Oded's choices.