What Can We Learn Privately?
by Shiva Kasiviswanathan, Homin Lee, Kobbi Nissim,
Sofya Raskhodnikova, and Adam Smith.
This work, which builds on the prior work
Practical Privacy: The SuLQ Framework
[by Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim, 2005],
studies which PAC learning tasks can be performed privately;
that is, the requirement is that the output of the learning process
(i.e., the hypothesis) does not violate the privacy of the
individual samples (obtained in the learning process).
Loosely speaking, they show that any PAC learnable concept class
can be learned privately, but efficiency as well as tight relation
to the VC-dimention may be lost.
They also relate the restricted PAC model of learning
by statistical queries to a restricted model of private computation
(i.e., local algorithms aka "randomized response" algorithm).
A remotely related comment: the notion of "differential privacy"
is stronger than the related notion of "statistical difference"
(when both notions are quantified wrt the same error level).
However, "differential privacy" is considered with respect
to a small but noticeable error, whereas in Cryptography one
usually considers negligible "statistical difference".
(Recall that "differential privacy" refers to the pointwise
ratio between two probability distributions, where the ratio
is required to be close to 1 (i.e., it is 1+error).)
The original abstract
Learning problems form an important category of computational tasks that
generalizes many of the computations researchers apply to large real-life
datasets. Defining a private learning algorithm as an algorithm that is both
a learner and preserves the notion of differential privacy, we can now
examine what concept classes can be learned privately.
We begin by showing a variant of the classical Occam's razor result. I.e.,
that ignoring computational complexity, it is possible to privately
agnostically learn any concept class using a sample size approximately
logarithmic in the cardinality of the hypothesis class. Specifically, if a
concept class is learnable by a (non-private) algorithm with polynomial
sample complexity and output size, then it can be learned privately using a
polynomial number of samples.
We also present a computationally efficient private PAC learner for the
class of parity functions. This result is somewhat surprising because parity
is thought to be very hard to learn given random classification noise, and
learning with noise and private learning are intuitively similar (both must
be robust to small changes in inputs).
Finally, we focus on the class of local algorithms (also called randomized
response) that are class of private algorithms that have received extensive
investigation due their practicality. We provide a precise characterization
of local private learning. Namely, a concept class is learnable by a local
and only if it is learnable in the statistical query (SQ) model, and
furthermore, the number of rounds in the local algorithm corresponds to the
adaptivity of the SQ learner. This allows us to present a separation between
the power of interactive and non-interactive local learning algorithms.
list of Oded's choices.