## (1) A CLT and tight lower bounds for estimating entropy

## (2) Estimating the unseen:
A sublinear-sample canonical estimator of distributions

by Gregory Valiant and Paul Valiant.

#### Oded's comments

I did not get to study these two related works yet,
but the results sound very impressive.
Rather than waiting another month (or so) till I get to study them,
let me take a small risk and recommend them somewhat blindly
(i.e., based on the abstracts).

Note that these results refer to approximations up to an additive
error term, whereas most previous works referred to constant factor
approximation. In case of entropy, the additive term is a constant.

The papers are available
as TR10-180
and TR10-180
of ECCC.

#### The original abstract of the second paper

We introduce a new approach to characterizing the unobserved portion
of a distribution, which provides sublinear-sample additive estimators
for a class of properties that includes entropy and distribution
support size.
Together with the lower bounds proven in the companion paper [29],
this settles the longstanding question of the sample complexities of
these estimation problems (up to constant factors). Our algorithm
estimates these properties up to an arbitrarily small additive
constant, using O(n/logn) samples; [29] shows that no algorithm on
o(n/logn) samples can achieve this (where n is a bound on the support
size, or in the case of estimating the support size, 1/n is a lower
bound the probability of any element of the domain). Previously, no
explicit sublinear-sample algorithms for either of these problems were known.
Additionally, our algorithm runs in time linear in the number of samples used.

Back to
list of Oded's choices.