Notes on Lecture 1 ``Obfuscation: Definition and first Impossibility" by Shafi
 ------------------------------------------------------------------------------

In this note we address several issues which came up during class
regarding asymptotitcs of the result, single function vs. families
of functions, non-computability etc..


The Barak etal impossibility result for Turing machines exhibits a
FAMILY F of finite functions and a Boolean predicate p as follows:

F= union {F_n}, F_n = {f_s: |s| =n}, such that there exists a
polynomial(n) size circuit for computing f_s(x) for every x in the
domain of f_s. Predicate p is defined on all f_s in F.

1) There is an algorithm A that runs in polynomial time in n such
that for every f in F, given a  TM  M for computing  f_s where
|s|=n, A(f_s) = p(f_s).

But 2) For all probabilistic polynomial time algorithms S given
only I/O access to f_s and the size of M, Prob (S^|M|(f_s) =
p(f_s)) < 1/2 + neg(n) where neg(n) is smaller than any 1/poly(n)
for sufficiently large n. The probability is taken over uniform
choice of f_s in F_n (as well as coins of the probabilistic
guessing algorithm S).



COMMENT 1 on FAMILIES OF FUNCTIONS VS. INDIVIDUAL FUNCTIONS

As pointed out in class to show an impossibility result for a
single function does not make sense, as in that case the function
is fixed and known (and can be hardcoded into an algorithm S), so
there is no sense in which p could be unpredictable. Thus, the
`impossibility' in the above result (or any other obfuscation
impossibility result) is to compute p(f) given that you have
access to a I/O for an f chosen at random from a family of f.

Note that this is the usual `language' we use in cryptography and
learning theory when we talk about `hard to distinguish' functions
or `easy to learn' functions. The statements refer to families of
functions and do not make sense for single finite functions.

For example, in cryptography when we say that a pseudo-random
function is `hard to distinguish' from random, we mean that given
I/O for f chosen at random (with respect to some distribution)
from a family of functions (with a given input and output size)
you cannot tell know whether it was chosen from all functions
(with the same input and output length) or from the restricted
family.

In contrast, for example, in learning, when you say that
polynomial functions are `easy to learn with queries' you mean
that given that you have access to a function drawn at random from
the family of polynomials in say n variables, and fixed degree d,
then you can find out the exact polynomial  in  poly(n) time .


COMMENT 2 on ASYMPTOTICS:

The Barak result is stated asymptotically, however it can be
stated so it makes sense for any any n.

Fix the number of queries S (adversary) makes to f -- call it q --
instead of letting it be an arbitrary polynomial, and make the
probability of success of S depend on q rather than a negligible
function in n. Then, (2) for Turing Machines, can be alternatively
stated as (2')


(2') For all polynomial time algorithms S making at most q queries
to f_s Prob (S^f_s(1^|M| = p(f_s)) < 1/2 + poly(q)/(2^n) where
again the probability is taken over uniform choice of f_s in F_n
(as well as coins of the probabilistic guessing algorithm S).


COMMENT 3 One vs. Two TM's:


In class, we showed two function families C_a,b,s  D_a,b,s for
which if you have code you can find out s. How do we formally
combine C and D to show one function family for which if you have
the code you can find s.


Recall:

Let a,b be strings of length n and s be a single bit.

Let C_a,b,s(x) = b if x=a and O^b otherwise Let D_a,b,s(x) = s if
x(a)=b and 0 otherwise (think of x as a TM).

The adversary A given code for C=C_a,b,s and D=D_a,b,s simply runs
D(C) and outputs s.

The adversary S who has only I/O access to C_a,b,s and D_a,b,s and
know there size, also attempts to guess the value of s. The claim
is the Probability [S guess s] < 1/2 + neg(n) (probability taken
over a,b,s  and of course S's coin tosses)/



To make this a statement about a single function F. Let F_a,b,s
(x,indicator) = C_a,b,s (x) if indicator = 1 and D_a,b,s (x) if
indicator = 0 for a,b in {0,1}^n and s a single bit.

Now, adversary A given obfuscated code for F = F_a,b,s simply (1)
sets indicator=1 in the obfuscated code for F which yields code
for C_a,b,s. (2) runs F(code for C_a,b,s,0) which yields
D_a,b,s(C_a,b,s).

In contrast, any adversary S with only I/O access to F_a,b,s
(where a,b, s chosen at random) is exactly as powerful as an
adversary with oracles to C and D.

COMMENT 4 on TIME OUTS:

It was asked in class what to do if when running D(C) above,
algorithm D or C hangs. Namely, it looks like we run into a
non-computable situation. Does the result still hold?

The main observation to address this issue, is to note that we
required that an obfuscator has a polynomial slow-down. For
obfuscation O, fix this polynomial of the slow-down to q (i.e say
new program runs q(T(n)) when old program runs T(n) time).


Now, to ensure everything is computable, we make the following
modifications.

1)Modify the definition of C_a,b,s (x) above to run for  at most
100n steps  and unless x terminates within this time, output 0^n.
Obviously, on input x=a, C has sufficient time to read a and
output b.

2) Modify the definition of D= D_a,b,s (x) to run input machine x
for at most r(n)  steps and unless x terminates within this time,
output 0.   What is r? it should allow enough time to simulate a
machine which runs in time q(100n). Say r=q3(n).

Lets check if Barak et al impossibility still holds.

The only change is showing there exists a modified adversary A'
which can predict pp and run in fixed polynomial time as follows.
A' will run D'=O(D_a,b,s) (obfuscator applied to D_a,b,s) on
C'=O(C_a,b,s) for r'(n) steps where r' is: a factor q slower than
the runtime of D was, namely r'=q(r(n)). This will suffice since
again we know that the obfuscated D' is only q slower than D.



FINAL NOTE: the entire discussion above applied to TM and not
circuit. With circuit the result is more complex, conditional on
the existence of one-way functions, and does not run into the
complication of time-outs.