Notes on Lecture 1 ``Obfuscation: Definition and first Impossibility" by Shafi ------------------------------------------------------------------------------ In this note we address several issues which came up during class regarding asymptotitcs of the result, single function vs. families of functions, non-computability etc.. The Barak etal impossibility result for Turing machines exhibits a FAMILY F of finite functions and a Boolean predicate p as follows: F= union {F_n}, F_n = {f_s: |s| =n}, such that there exists a polynomial(n) size circuit for computing f_s(x) for every x in the domain of f_s. Predicate p is defined on all f_s in F. 1) There is an algorithm A that runs in polynomial time in n such that for every f in F, given a TM M for computing f_s where |s|=n, A(f_s) = p(f_s). But 2) For all probabilistic polynomial time algorithms S given only I/O access to f_s and the size of M, Prob (S^|M|(f_s) = p(f_s)) < 1/2 + neg(n) where neg(n) is smaller than any 1/poly(n) for sufficiently large n. The probability is taken over uniform choice of f_s in F_n (as well as coins of the probabilistic guessing algorithm S). COMMENT 1 on FAMILIES OF FUNCTIONS VS. INDIVIDUAL FUNCTIONS As pointed out in class to show an impossibility result for a single function does not make sense, as in that case the function is fixed and known (and can be hardcoded into an algorithm S), so there is no sense in which p could be unpredictable. Thus, the `impossibility' in the above result (or any other obfuscation impossibility result) is to compute p(f) given that you have access to a I/O for an f chosen at random from a family of f. Note that this is the usual `language' we use in cryptography and learning theory when we talk about `hard to distinguish' functions or `easy to learn' functions. The statements refer to families of functions and do not make sense for single finite functions. For example, in cryptography when we say that a pseudo-random function is `hard to distinguish' from random, we mean that given I/O for f chosen at random (with respect to some distribution) from a family of functions (with a given input and output size) you cannot tell know whether it was chosen from all functions (with the same input and output length) or from the restricted family. In contrast, for example, in learning, when you say that polynomial functions are `easy to learn with queries' you mean that given that you have access to a function drawn at random from the family of polynomials in say n variables, and fixed degree d, then you can find out the exact polynomial in poly(n) time . COMMENT 2 on ASYMPTOTICS: The Barak result is stated asymptotically, however it can be stated so it makes sense for any any n. Fix the number of queries S (adversary) makes to f -- call it q -- instead of letting it be an arbitrary polynomial, and make the probability of success of S depend on q rather than a negligible function in n. Then, (2) for Turing Machines, can be alternatively stated as (2') (2') For all polynomial time algorithms S making at most q queries to f_s Prob (S^f_s(1^|M| = p(f_s)) < 1/2 + poly(q)/(2^n) where again the probability is taken over uniform choice of f_s in F_n (as well as coins of the probabilistic guessing algorithm S). COMMENT 3 One vs. Two TM's: In class, we showed two function families C_a,b,s D_a,b,s for which if you have code you can find out s. How do we formally combine C and D to show one function family for which if you have the code you can find s. Recall: Let a,b be strings of length n and s be a single bit. Let C_a,b,s(x) = b if x=a and O^b otherwise Let D_a,b,s(x) = s if x(a)=b and 0 otherwise (think of x as a TM). The adversary A given code for C=C_a,b,s and D=D_a,b,s simply runs D(C) and outputs s. The adversary S who has only I/O access to C_a,b,s and D_a,b,s and know there size, also attempts to guess the value of s. The claim is the Probability [S guess s] < 1/2 + neg(n) (probability taken over a,b,s and of course S's coin tosses)/ To make this a statement about a single function F. Let F_a,b,s (x,indicator) = C_a,b,s (x) if indicator = 1 and D_a,b,s (x) if indicator = 0 for a,b in {0,1}^n and s a single bit. Now, adversary A given obfuscated code for F = F_a,b,s simply (1) sets indicator=1 in the obfuscated code for F which yields code for C_a,b,s. (2) runs F(code for C_a,b,s,0) which yields D_a,b,s(C_a,b,s). In contrast, any adversary S with only I/O access to F_a,b,s (where a,b, s chosen at random) is exactly as powerful as an adversary with oracles to C and D. COMMENT 4 on TIME OUTS: It was asked in class what to do if when running D(C) above, algorithm D or C hangs. Namely, it looks like we run into a non-computable situation. Does the result still hold? The main observation to address this issue, is to note that we required that an obfuscator has a polynomial slow-down. For obfuscation O, fix this polynomial of the slow-down to q (i.e say new program runs q(T(n)) when old program runs T(n) time). Now, to ensure everything is computable, we make the following modifications. 1)Modify the definition of C_a,b,s (x) above to run for at most 100n steps and unless x terminates within this time, output 0^n. Obviously, on input x=a, C has sufficient time to read a and output b. 2) Modify the definition of D= D_a,b,s (x) to run input machine x for at most r(n) steps and unless x terminates within this time, output 0. What is r? it should allow enough time to simulate a machine which runs in time q(100n). Say r=q3(n). Lets check if Barak et al impossibility still holds. The only change is showing there exists a modified adversary A' which can predict pp and run in fixed polynomial time as follows. A' will run D'=O(D_a,b,s) (obfuscator applied to D_a,b,s) on C'=O(C_a,b,s) for r'(n) steps where r' is: a factor q slower than the runtime of D was, namely r'=q(r(n)). This will suffice since again we know that the obfuscated D' is only q slower than D. FINAL NOTE: the entire discussion above applied to TM and not circuit. With circuit the result is more complex, conditional on the existence of one-way functions, and does not run into the complication of time-outs.