The main result in this paper is that knowing the historgram of the distribution (or, equivalently, knowing the distriburtion up to a relaling of its elements) does not help much when trying to learn the distribution. In other words, an algorithm is presented such that given $n$ samples from an arbitrary distribution $p$ performs almost as well as any algorithm that is given the histogram of $p$ (i.e., the set $\{(v,h_p(v))\in(0,1]\times\N:h_p(v)=|\{i:p(i)=v\}|\}$) along with these $n$ samples.
This result may be interpreted as saying that label-invariant information regarding the distribution to be learned is of limited value. One should be careful, though, and note that the oblivious algorithm introduces a small (vanishing in the size of the sample) error, which may be much larger than the learning error of the non-oblivious algorithm in the case that the latter error is very small.
An alternative (and orthogonal) interpretation is that, in many cases, learning distributions may be performed much more efficiently than the worst-case. Furthermore, such a significant improvement does not require a restruiction on the class of distributions that can be learned. Instead, a single algorithm can adapt itself to the class and performs optimally for every class, where the class is quite small and consists of all distributions having the same fixed histogram.
(Very nice indeed! Still, using the term instance-optimal here is a bit inaccurate, since the instance is the distribution, not its histogram, whereas optimality is only with respect to the latter. Things are worst in a prior paper of the authors where the term instance-optimal refers to a fixed (massive) parameter of the problem, which belongs to a class of problems.)
See the program webpage of STOC'16.