Realizable Learning is All You Need

by Max Hopkins, Daniel M. Kane, Shachar Lovett, Gaurav Mahajan

Oded's comments

This works shows a simple reduction of agnostic learning to realizable learning, where the latter term refers to learning a concept that is in a predetermined concept class (wheraes the former term refers to approximating an arbitrary concept using a predetermined concept class). The reduction consists of the following three steps.

  1. Obtaining $s(\eps/2,\delta/2)$ (unlabeled) samples from the underlying distribution, where $s(\eps',\delta')$ denotes the number of samples used by the given learning algorithm $L$.
  2. Invoke $L$ on each possible labelling of the aforementioned sample, where the labellings are by any possible concept in the class. Let $C$ denote the result set of candidate hypotheses.
  3. Using a label sample of size $O(\eps^{-2}\log(|C|/\delta))$, output the candidate that achieves minimal error (wrt this labeled sample).
Note that generically $|C|\leq2^{s(\eps/,\delta/2)}$, but $|C|=O(2^d/\eps)$ if the hypothesis class has VC-dim $d$. In the latter case we obtain an agnostic learners that of optimal sample complexity upto a logarithmic factor. The fact that the agnostic learner does not preserve the time complexity of the given learner is inherent.

The original abstract

The equivalence of realizable and agnostic learnability is a fundamental phenomenon in learning theory. With variants ranging from classical settings like PAC learning and regression to recent trends such as adversarially robust and private learning, it’s surprising we still lack a unified theory; traditional proofs of the equivalence tend to be disparate, and rely on strong model-specific assumptions like uniform convergence and sample compression. In this work, we give the first model-independent framework explaining the equivalence of realizable and agnostic learnability: a three-line blackbox reduction that simplifies, unifies, and extends our understanding across a wide variety of settings. This includes models with no known characterization of learnability such as learning with arbitrary distributional assumptions or general loss, as well as a host of other popular settings such as robust learning, partial learning, fair learning, and the statistical query model. More generally, we argue that the equivalence of realizable and agnostic learning is actually a special case of a broader phenomenon we call property generalization: any desirable property of a learning algorithm (e.g. noise tolerance, privacy, stability) that can be satisfied over finite hypothesis classes extends (possibly in some variation) to any learnable hypothesis class.

Available from 35th COLT.


Back to list of Oded's choices.