We consider private data analysis in the setting in which a trusted and
trustworthy curator, having obtained a large data set containing private
information, releases to the public a "sanitization" of the data set
that simultaneously protects the privacy of the individual contributors of data
and offers utility to the data analyst. The sanitization may be in the form of
an arbitrary data structure, accompanied by a computational procedure for
determining approximate answers to queries on the original data set, or it may
be a "synthetic data set" consisting of data items drawn from the
same universe as items in the original data set; queries are carried out as if
the synthetic data set were the actual input. In either case the process is
non-interactive; once the sanitization has been released the original data and
the curator play no further role.
For the task of sanitizing with a synthetic dataset output, we map the
boundary between computational feasibility and infeasibility with respect to a
variety of utility measures. For the (potentially easier) task of sanitizing
with unrestricted output format, we show a tight qualitative and quantitative
connection between hardness of sanitizing and the existence of traitor tracing
schemes.
The paper:
PDF.
Full
Paper.