What Can We Learn Privately?

by Shiva Kasiviswanathan, Homin Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith.

Oded's comments

This work, which builds on the prior work Practical Privacy: The SuLQ Framework [by Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim, 2005], studies which PAC learning tasks can be performed privately; that is, the requirement is that the output of the learning process (i.e., the hypothesis) does not violate the privacy of the individual samples (obtained in the learning process). Loosely speaking, they show that any PAC learnable concept class can be learned privately, but efficiency as well as tight relation to the VC-dimention may be lost. They also relate the restricted PAC model of learning by statistical queries to a restricted model of private computation (i.e., local algorithms aka "randomized response" algorithm).
A remotely related comment: the notion of "differential privacy" is stronger than the related notion of "statistical difference" (when both notions are quantified wrt the same error level). However, "differential privacy" is considered with respect to a small but noticeable error, whereas in Cryptography one usually considers negligible "statistical difference". (Recall that "differential privacy" refers to the pointwise ratio between two probability distributions, where the ratio is required to be close to 1 (i.e., it is 1+error).)

The original abstract

Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large real-life datasets. Defining a private learning algorithm as an algorithm that is both a learner and preserves the notion of differential privacy, we can now examine what concept classes can be learned privately.
We begin by showing a variant of the classical Occam's razor result. I.e., that ignoring computational complexity, it is possible to privately agnostically learn any concept class using a sample size approximately logarithmic in the cardinality of the hypothesis class. Specifically, if a concept class is learnable by a (non-private) algorithm with polynomial sample complexity and output size, then it can be learned privately using a polynomial number of samples.
We also present a computationally efficient private PAC learner for the class of parity functions. This result is somewhat surprising because parity is thought to be very hard to learn given random classification noise, and learning with noise and private learning are intuitively similar (both must be robust to small changes in inputs).
Finally, we focus on the class of local algorithms (also called randomized response) that are class of private algorithms that have received extensive investigation due their practicality. We provide a precise characterization of local private learning. Namely, a concept class is learnable by a local algorithm if and only if it is learnable in the statistical query (SQ) model, and furthermore, the number of rounds in the local algorithm corresponds to the adaptivity of the SQ learner. This allows us to present a separation between the power of interactive and non-interactive local learning algorithms.

Back to list of Oded's choices.