## Realizable Learning is All You Need

by Max Hopkins, Daniel M. Kane, Shachar Lovett, Gaurav Mahajan

#### Oded's comments

This works shows a simple reduction of agnostic learning
to realizable learning, where the latter term refers
to learning a concept that is in a predetermined concept class
(wheraes the former term refers to approximating an arbitrary
concept using a predetermined concept class).
The reduction consists of the following three steps.

- Obtaining $s(\eps/2,\delta/2)$ (unlabeled) samples from
the underlying distribution, where $s(\eps',\delta')$ denotes
the number of samples used by the given learning algorithm $L$.
- Invoke $L$ on each possible labelling of the aforementioned
sample, where the labellings are by any possible concept in the class.
Let $C$ denote the result set of candidate hypotheses.
- Using a label sample of size $O(\eps^{-2}\log(|C|/\delta))$,
output the candidate that achieves minimal error (wrt this labeled sample).

Note that generically $|C|\leq2^{s(\eps/,\delta/2)}$,
but $|C|=O(2^d/\eps)$ if the hypothesis class has VC-dim $d$.
In the latter case we obtain an agnostic learners that
of optimal sample complexity upto a logarithmic factor.
The fact that the agnostic learner does not preserve
the time complexity of the given learner is inherent.

#### The original abstract

The equivalence of realizable and agnostic learnability
is a fundamental phenomenon in learning theory.
With variants ranging from classical settings like PAC learning
and regression to recent trends such as adversarially robust
and private learning, it’s surprising we still lack a unified theory;
traditional proofs of the equivalence tend to be disparate,
and rely on strong model-specific assumptions like uniform convergence
and sample compression.
In this work, we give the first model-independent framework explaining
the equivalence of realizable and agnostic learnability:
a three-line blackbox reduction that simplifies, unifies,
and extends our understanding across a wide variety of settings.
This includes models with no known characterization of learnability
such as learning with arbitrary distributional assumptions or general loss,
as well as a host of other popular settings such as robust learning,
partial learning, fair learning, and the statistical query model.
More generally, we argue that the equivalence of realizable
and agnostic learning is actually a special case of a broader phenomenon
we call property generalization: any desirable property of
a learning algorithm (e.g. noise tolerance, privacy, stability)
that can be satisfied over finite hypothesis classes extends
(possibly in some variation) to any learnable hypothesis class.

Available from
35th COLT.

Back to
list of Oded's choices.