A Multiplicative Weights Mechanism for Interactive Privacy-Preserving Data Analysis

by Moritz Hardt and Guy Rothblum

Oded's comments

The main result is a vast improvement in the dependence of privacy-preserving data analysis on the number of potential (low sensitivity) adaptive queries. Specifically, the dependence is reduced from a polynomial relation to a logarithmic relation. Such a complexity reduction was shown before in the context of non-adaptive queries (cf. Boosting and Differential Privacy).

The algorithm itself is based on learning theoretic approaches. By default, it answers the current query by using a current (ficticious) "hypothesis" regarding who is included in the database, and it updates its hypothesis if the answer deviates from the true one by too much. Intuitively (although this requires a subtle argument), queries for which the current hypothesis is good do not violate the privacy, whereas the number of violating queries is relatively small (since each such violation yields a significant improvement in the hypothesis). Interestingly, the numbers are not low enough so to yield the desired result if a straightforward composition result is used, but here the better composition result of the paper Boosting and Differential Privacy is used, where the cost of composition is only related to a square root of the number of applications rather than to the number itself.

The original abstract

We consider statistical data analysis in the interactive setting. In this setting a trusted curator maintains a database of sensitive information about individual participants, and releases privacy-preserving answers to queries as they arrive. Our primary contribution is a new differentially private multiplicative weights mechanism for answering a large number of interactive counting (or linear) queries that arrive online and may be adaptively chosen. This is the first mechanism with worst-case accuracy guarantees that can answer large numbers of interactive queries and is efficient (in terms of the runtime's dependence on the data universe size). The error is asymptotically optimal in its dependence on the number of participants, and depends only logarithmically on the number of queries being answered. The running time is nearly linear in the size of the data universe. As a further contribution, when we relax the utility requirement and require accuracy only for databases drawn from a rich class of databases, we obtain exponential improvements in running time. Even in this relaxed setting we continue to guarantee privacy for any input database. Only the utility requirement is relaxed. Specifically, we show that when the input database is drawn from a smooth distribution --- a distribution that does not place too much weight on any single data item --- accuracy remains as above, and the running time becomes poly-logarithmic in the data universe size. The main technical contributions are the application of multiplicative weights techniques to the differential privacy setting, a new privacy analysis for the interactive setting, and a technique for reducing data dimensionality for databases drawn from smooth distributions.


Back to list of Oded's choices.