## Instance Optimal Learning of Discrete Distributions

by Gregory Valiant and Paul Valiant.

#### Oded's comments

The main result in this paper is that
knowing the historgram of the distribution (or, equivalently,
knowing the distriburtion up to a relaling of its elements)
does not help much when trying to learn the distribution.
In other words, an algorithm is presented such that given $n$
samples from an arbitrary distribution $p$ performs almost as
well as any algorithm that is given the histogram of $p$
(i.e., the set $\{(v,h_p(v))\in(0,1]\times\N:h_p(v)=|\{i:p(i)=v\}|\}$)
along with these $n$ samples.

This result may be interpreted as saying that label-invariant
information regarding the distribution to be learned is of limited value.
One should be careful, though, and note that the oblivious algorithm
introduces a small (vanishing in the size of the sample) error,
which may be much larger than the learning error of the non-oblivious
algorithm in the case that the latter error is very small.

An alternative (and orthogonal) interpretation is that,
in many cases, learning distributions may be performed
much more efficiently than the worst-case.
Furthermore, such a significant improvement does not require
a restruiction on the class of distributions that can be learned.
Instead, a single algorithm can adapt itself to the class
and performs optimally for every class, where the class
is quite small and consists of all distributions having
the same fixed histogram.

(Very nice indeed!
Still, using the term instance-optimal here is a bit inaccurate,
since the instance is the distribution, not its histogram,
whereas optimality is only with respect to the latter.
Things are worst in a prior paper of the authors
where the term instance-optimal refers to a fixed (massive)
parameter of the problem, which belongs to a class of problems.)

#### The original abstract

See the
program webpage of STOC'16.

Back to
list of Oded's choices.