## Interactive Proofs for Verifying Machine Learning

by Shafi Goldwasser, Guy Rothblum, Jonathan Shafer, Amir Yehudayoff

#### Oded's comments

I think the abstract does a good job in describing
the contents of this work, which is well motivated
(today, as it was in the past and will be in the future).

#### The original abstract

We consider the following question:
using a source of labeled data and interaction with an untrusted prover,
what is the complexity of verifying that a given hypothesis is
*approximately correct*? We study interactive proof systems
for PAC verification, where a verifier that interacts with
a prover is required to accept good hypotheses, and reject bad hypotheses.
Both the verifier and the prover are efficient
and have access to data samples from an unknown distribution.
We are interested in cases where the verifier can use significantly less
data than is required for (agnostic) PAC learning,
or use a substantially cheaper data source
(e.g., using only random samples for verification,
even though learning requires membership queries).
We believe that today, when data and data-driven algorithms
are quickly gaining prominence, the question of verifying purported
outcomes of data analyses is very well-motivated.

We show three main results.
First, we prove that for a specific hypothesis class,
verification is significantly cheaper than learning
in terms of the number of random samples required,
even if the verifier engages with the prover only
in a single-round (NP-like) protocol.
Moreover, for this class we prove that single-round verification
is also significantly cheaper than testing closeness to the class.
Second, for the broad class of Fourier-sparse boolean functions,
we show a multi-round (IP-like) verification protocol,
where the prover uses membership queries, and the verifier
is able to assess the result while only using random samples.
Third, we show that verification is not always more efficient.
Namely, we show a class of functions where verification
requires as many samples as learning does, up to a logarithmic factor.

Available from
ECCC TR20-058.

Back to
list of Oded's choices.