Guidelines for a reading group in introduction to computational complexity

Oded Goldreich

Quick links: The notes on P, NP, and NPC, P/poly and PH, space complexity, randomness and counting, hardness amplification, pseudorandomness, probabilistic proof systems, and average-case complexity.

GENERAL NOTES (to be read carefully)

This guided reading course is based on my book Computational Complexity: A Conceptual Perspective. (All sections and exercises refer to this book.) An e-copy of the book will be sent to you upon request. In addition, I have a few hard copies that I can lend out.

The conceptual perspective. This guided reading course (as the book it uses) is not aimed at merely teaching technical material, but rather also aims at addressing the question of why are certain questions asked. The answers are rooted in natural (intuitive) concepts and concerns, and we shall pay attention to how these intuitive concepts (resp., concerns) are formalized in definitions (resp., rigorous questions).

Thus, even if you know some material from other studies, please read the assigned material (and, especially, the high level discussion). In general, don't "fast-forward" over the "easy" conceptual discussions, but rather read them carefully, think them over, and even criticize them (but not for the sake of critique or posing clever -- we have no time for poses...).

Overview versus details. Sometimes, details are crucial, but the overview comes first. Don't get too entangled in the details, always keep in touch with the overview. If you fail to understand some details, make sure first that you understand the overview. Then, try to "locate" your difficulty.

On proof sketches that are not followed by full proofs. The first such examples are Thm 2.28 and 2.33, but at later chapters there are more of these, let alone entire subsections that are merely sketches (see, e.g., Sec 8.4.2.1 and Sec 9.3.2.2 and 9.3.2.3). In these cases, the sketch is not meant to be fully understandable by you, but rather to provide some taste of the ideas as well as increase your belief in the correctness of the result and the feasibility of the proof strategy. In some cases, you can figure out the details by using the hints provided in the sketch, if you spend an hour or more thinking about it, and in other cases this is not really feasible (and the text indicates so explicitly (e.g., warning on page 393) or implicitly).

On exercises and guidelines. In general, solving exercises is a dubious tool towards learning the material. For sure, the exercises are of secondary importance compared to the main text. The guidelines should be read only after trying to solve the exercises without them.

The homework assignments. The "homework assignments" (about 4 per semester) are meant to provide a framework for the course. They are meant to be light load, so don't ever spend too much time on them (i.e., more than a few hours). If you cannot do the homework assignment in a couple of hours, then it may be the case that you got entangled in some low-level details, which is not my intention. In such a case, please consult other students, and if necessary consult me. But do so before wasting too much time on it!

The reading assignments. The required reading for the next meeting will be called reading assignment (and this is what you should not skip, even if you learned the material before). In addition, I will offer optional reading (which is not mandatory).

Varying difficulty. The reading assignments vary in their difficulty, and I am providing my estimates based on past experience. In the few exceptionally difficulty assignments (estimated at 4 and above on a scale of 5), I suggest inviting me to join your meeting.

Errors and additions to the book. As any text, the book contains some unfortunate errors. In case of doubts, please consult the list of currently detectable errors, gaps and missing clarifications in the webpage of the book. Needless to say, I'd be grateful for pointing out add'l errors, which I'm sure exist, unfortunately.

These notes are mostly meant to be read before reading the assignment, but some parts (e.g., the quiz and list of suggested exercises are meant for later). In few places, I also included additional intuitions and arguments, which may be read along with the main text. Recall that you are mostly on your own, but still don't hesitate to ask me for help whenever you really need it. Here is some advice written by Orr Paradise, who participated in such a group in 2017/18.

NOTES FOR 1ST MEETING

Topic: P-versus-NP.
Estimated difficulty: 1 in the scale of 5.

[From a discussion with the students in Fall 2020]

Welcome to graduate school and to WIS. The attitude here is different than in your undergraduate studies: You assume full responsibility for your learning. This gets to an extreme in this reading group. A few specific hints:
- Advanced texts are rarely absolutely correct and complete, so reading them is a creative process involving thinking critically (rather than just reading literally).
- Exercises are merely a tool towards testing your understanding, and as such they are definitely secondary. The real purpose is learning, understanding things you did not understand or know before. Most importantly, abandon the pattern-matching techniques, which were encouraged by bad undergraduate and prior education (which focuses on performance and neglects understanding).
I will be happy to be consulted about anything, alone or as part of a group, but let's keep this separate from this reading group (unless it refers to the material learned here).
I wish to stress (again) the conceptual aspect of the material: The questions, definitions, and ideas. Don't get entangled in details. Don't waste time on them, unless you think this is necessary for the big picture.
Don't skip Chapter 2, although you may believe that you know the material. This chapter does repeat much of what you know, but does so while offering more conceptual perspectives than usual. In particular:
- P vs NP in terms of search problems.
- NP as a proof system (i.e., efficient verification).
- Self reducibility (i.e., reducing search problems to corresponding decision problems).
- The existence of an optimal algorithm for solving any problem in NP.

[Back to the original notes...]

Indeed, the P-vs-NP question will be our first topic. I do expect that almost all of you heard these terms before, but I guess that most did not get an adequate perspective on it. Therefore, I insist that you read the relevant Section 2.1, and test yourself with the quiz provided below. (Specifically, this section contains a formulation of the P-vs-NP question/conjecture in terms of search problems as well as a definition of NP in terms of proof systems.)

If you care, see my handwritten preparation notes (partially in Hebrew): Sheets 1 and 2.

BACKGROUND BEING ASSUMED: Familiarity with the notion of an algorithm, and the basic formulations of computability theory. Take a look at Section 1.2.1-1.2.3 of the book and read it if you don't know it. [This is not a reading assignment, but rather the "prerequisite".] (A more detailed description of the background, with some exercises, is provided in Chapter 1 of my Basic Complexity book available at ~oded/bc-book.html.)

READING ASSIGNMENT:

A short section on Universal Algorithms (Sec 1.2.3.4)
A short section on Time Complexity (Sec 1.2.3.5)
The P-vs-NP Problem in its two formulations (Sec 2.1.0-2.1.4 and Sec 2.1.6, where Sec 2.1.0 means the part of Sec 2.1 that comes before Sec 2.1.1).

A revision of Sec 2.1 appears as Chapter 2 of my Basic Complexity book (available HERE); the pace there is a bit slower and more details are given.

SUGGESTED OPTIONAL READING

The "traditional definition of NP" and how it fits in (Sec 2.1.5).
Meditations, only if you are philosophically inclined (Sec 2.1.7).

SUGGESTED QUIZ FOR THE BACKGROUND MATERIAL (not to be submitted):

What is the default representation of integers (in complexity theory)?
What are search and decision problems?
What is the motivation for considering the model of Turing machines?
What does the Church-Turing Thesis assert?
What is a universal algorithm?
What does undecidability mean?
What is the time complexity of an algorithm?
What does the Cobham-Edmonds Thesis assert?
What are Boolean circuits and formulae?

SUGGESTED QUIZ FOR SEC 2.1 (not to be submitted):

What are the justifications for associating efficient computation with polynomial-time algorithms?
What are the classes PF and PC?
What are the classes P and NP?
List a few computational problems in PF (resp., P).
Going beyond the list of the previous question, list a few problems in PC (resp., NP).
What does "PC not contained in PF" mean in intuitive terms?
What does "P neq NP" mean in intuitive terms?
Is it the case that "PC not contained in PF" if and only if "P neq NP"?
What are the justifications for believing that "P neq NP"?

SUGGESTED EXERCISES FOR SEC 2.1 (not to be submitted):

Exer 2.1-2.3 are all that is offered. They are simple but test understanding of the basic definitions.

TAKE-HOME MESSAGE FOR SEC 2.1:

The search and decision versions of the P vs NP Question, and their equivalence (Thm 2.6).

NOTES FOR 2ND MEETING

Topic: Polynomial-time reductions.
Estimated difficulty: 1 in the scale of 5.

As stressed in prior meeting, I wish to again stress that there is nothing fundamental in using polynomials as bounds on efficient computation. This is merely a natural and convenient convention; the phenomena that we discuss (e.g., the complexity of finding vs checking, deciding vs verifying based on proofs, reductions, etc) are independent of this convention, but are best captured by using it.

The topics of the reading assignment (for next meeting) refer to poly-time reductions. I wish to highlight the general notion of a reduction (and poly-time reduction), which is fundamental beyond its role in the theory of NPC. The general notion of an oracle machine [Sec 1.2.3.6] is the basis of the general definition of a reduction, which are sometimes called Turing-reductions. We shall focus on polynomial-time reductions, which are called Cook-reductions. When reading, note that Karp-reductions (i.e., reduction by mapping/translation/encoding) as used in the theory of NPC are a restricted/special case of Cook-reductions.

As you shall see, the notion of Cook-reductions is used in discussing the reduction of optimization problems to search problems, and the reduction of search problems to decision problems. Each search problem $R$ is reducible to $S'_R = \{\pair{x,y'} \exists y'' (x,y'y'')\in R\}$ (which is a decision problem...), but in some cases $R$ is reducible to $S_R = \{x: R(x)\neq\emptyset\}$. In the latter case, we say that $R$ is self-reducible.

READING ASSIGNMENT

A short section on oracle machines (Sec 1.2.3.6)
Polynomial-time reductions (Sec 2.2.0-2.2.3).

Section 2.2.3.2 assumes you know the definition of NP-completeness; if you don't please read Sec 2.3.1 before.

SUGGESTED OPTIONAL READING

Digest and general perspectives re poly-time reductions (Sec 2.2.4).

SUGGESTED QUIZ FOR SEC 2.2 (not to be submitted):

What are Cook-reductions?
What are Karp-reductions and Levin-reductions?
What is the motivation for defining all these types of reductions?
Can any problem in PC be reduced to some problem in NP?
What is self-reducibility and how does it relate to the previous question?
List five search problems that are self-reducible.

SUGGESTED EXERCISES FOR SEC 2.2 (not to be submitted):

Exer 2.5 and 2.6: Basic properties of reductions
Exer 2.8: Search vs Optimization
Exer 2.11-2.12: Self-reducibility of popular search problems.

TAKE-HOME MESSAGES FOR SEC 2.2:

The general notion of a (polynomial-time) reduction, and the fact that Karp-reductions used in the definition of NP-completeness are a special case.
The notion of self-reducibility (of search problems) -- Def 2.14.
All search problems associated with NP-sets are self-reducible (Thm 2.16).

NOTES FOR 3RD MEETING

Topic: NP-completeness (NPC).
Estimated difficulty: 1 in the scale of 5.

Yes, I know that most of you heard about NP-completeness, but I wish to highlight a few things about it. First, the mere existence of NP-complete problems, which is non-obvious a priori! Second, that it is easier to establish the existence of NP-complete problems than to prove that natural problems such as SAT are NP-complete. (Indeed, SAT and other famous NPC problems are very natural computational problems, more natural than "bounded halting", although the latter is not as unnatural as usually perceived...) Third, when proving that SAT is NP-complete, I advocate decoupling the reduction of NP to SAT into two reductions: The 1st reduction is from NP to CSAT (i.e., Circuit satisfiability), and the 2nd from CSAT to SAT (i.e., satisfiability of CNF formula). The 1st reduction relies on the fact that circuits can emulate the computation of Turning machines (TMs) on inputs of fixed length, while the 2nd reduction shows that CNFs with auxiliary variables can "emulate" the computation of arbitrary circuits. These are important "lessons to take home".

Note that assuming that $P neq NP$, there exist problems in $NP - P$ that are not in NPC (i.e., the class of NP-complete problems). Make a picture of the "world" under this assumption, and ask yourself what does the "world" look if $P=NP$.

READING ASSIGNMENT

NP-completeness (Sec 2.3.0-2.3.3.1); this covers the definitions, existence of NPC problems, and the fact that CSAT/SAT is NPC.
When reading Sec 2.3.3.1, read also Sec 1.2.4.0-1.2.4.1 (for perspective).
Read the first page of Sec 2.3.4 (i.e., Thm 2.28). Reading the proof sketch is not required, and is better not attempted now (but rather after reading the proof of Thms 4.3 and 4.6). I want you to understand Thm 2.28 and its meaning, and I don't care if you have no clue re how to prove it.

SUGGESTED OPTIONAL READING

If you did not see other NP-completeness proofs before, then you may want to read Sec 2.3.3.2.
Read and think about (but don't overdo it nor submit) Exer 2.28, 2.29, and 2.30, which discuss add'l features of some (standard/modified) Karp-reductions.
Meditations (Sec 2.2.4 and 2.3.5).
The proof sketch of Thm 2.28 may be hard to get. It is better to read it after reading simpler proofs "by diagonalization" such as the proof of Thm 4.3 (which you may read now...). In any case, if you decide to look at the proof sketch of Thm 2.28, then be prepared to only get a vague feeling of how it is proved (see first page of this sketch). That is, try to get the basic idea. It is likely that if you want to get the proof with all its details, this may require much time (while possibly consulting other sources...), and I do not recommend spending so much time on it.

SUGGESTED QUIZ FOR SEC 2.3 (not to be submitted):

What are NP-complete (search and decision) problems?
Is it likely that the problem of finding a perfect matching in a given graph is NP-complete?
Prove the existence of NP-complete problems.
How does the complexity of solving one NP-complete problem effect the complexity of solving any problem in NP (resp., PC)? That is, assuming that some NP-complete problem can be solved in time $t$, upper-bound the complexity of solving any problem in NP (resp., PC).
List five NP-complete problems.
Why does the fact that SAT is Karp-reducible to "Set Cover" imply that "Set Cover" is NP-complete?
Are there NP-problems that are neither in P nor are NP-complete?

SUGGESTED EXERCISES FOR SEC 2.3 (not to be submitted):

Exer 2.21: Details re reducing SAT to 3SAT.
Exer 2.23-2.25: NP-hardness of some algebraic problems.
Exer 2.28-2.29: Important additional properties of some reductions.

TAKE-HOME MESSAGES FOR SEC 2.3:

The mere existence of NP-complete problems, which is non-obvious a priori.
It is easier to establish the existence of NP-complete problems than to prove that natural problems such as SAT are NP-complete.
Decoupling the reduction of NP to SAT into two reductions: a reduction of NP to CSAT and a reduction of CSAT to SAT.
Circuits can efficiently emulate the computation of Turning machines on inputs of fixed length (see proof of Thm 2.21).
CNFs with auxiliary variables can "emulate" the computation of arbitrary circuits (see proof of Thm 2.22).
There exist NP-problems that are neither in P nor are NP-complete (Thm 2.28).

NOTES FOR 4TH MEETING

Topics: promise problems, optimal search algorithms, and coNP.
Estimated difficulty: 1.5 in the scale of 5.

This time you get a homework assignment (details about submission will be determined by this time). Please bear in mind my warning regarding not spending too much time on homework and/or exercises.

The following reading assignment consists of three parts.

The FIRST part (Sec 2.4.1 on "promise problems") is mainly conceptual, and it elaborates on an important issue that was mentioned already (in prior sections of the book). Specifically, the point is that natural computational problems are actually promise problems (with an efficiently recognizable promise), although they are computationally equivalent to computational problems with a trivial promise (i.e, standard decision and search problems). This issue will be ignored in the rest of the course. You should understand what are the issues involved and why (and when) we can ignore them. Furthermore, the formalism of promise problems is essential in some places (see an example next).

One type of promise problems that is of special interest is the notion of a "candid search problem", which is a promise problem consisting of finding solutions when promised that such solutions exist. Note that the promise may not be poly-time decidable.

The SECOND part (sec 2.4.2) is an intriguing claim asserting that we do know the best (i.e., optimal in a natural sense) algorithm for any candid search problem in PC, we just don't know its running time. But you may be disappointed when seeing what this algorithm does.

The theorem states that for every $R\in\PC$, there exists a polynomial $p$ and an algorithm $A$ that solves the candid search problem of $R$ such that for every $A'$ that solves the candid search problem of $R$ it holds that $t_A(x) = \tildeO(t_{A'}(x)+p(|x|))$, where $t_B(x)$ denotes the number of steps of $B(x)$ and $\tildeO(n) = n\cdot \poly(\log n)$. (The $\tildeO$ expression in $t_A(x) = \tildeO(t_{A'}(x)+p(|x|))$ depends on $A'$.)

The THIRD (and last) part (Sec 2.4.3) introduces the class coNP, and sheds light on the difference between Karp and Cook reduction in the context of reductions among decision problems. In particular, it is proved that (i) NP-complete sets cannot be in coNP, where completeness is under Karp-reductions; and (ii) that NP-complete sets cannot be Cook-reduced to a set in NP intersection coNP. (Note that (i) talks about being in some class, whereas (ii) talks about being Cook-reducible to some class.)

READING ASSIGNMENT: Basically it is to read Sec 2.4 (i.e., "Three Advanced Topics"). Please read Exercise 2.36 (without the guideline and Part 3).

HOMEWORK ASSIGNMENT (to be submitted (details to be determined)):

Exercises 2.10 and 2.11.
Exercise 2.13 and Part (1) of Exercise 2.14. See add'l guidelines in the correction list at the book's webpage
Bonus (not required!): Exercise 2.36.

SUGGESTED *OPTIONAL* READING -- REPEATING THE ONES OF MEETINGS 2 and 3:

Read the digest and general perspectives re poly-time reductions (Sec 2.2.4).
If you did not see other NP-completeness proofs before, then you may want to read Sec 2.3.3.2.
Read and think (but don't overdo it nor submit) Exer 2.28, 2.29, and 2.30, which discuss add'l features of some (standard/modified) Karp-reductions.
Meditations, only if you are philosophically inclined (Sec 2.1.7).
The proof sketch of Thm 2.28 may be hard to get. It may be better to read it after reading simpler proofs "by diagonalization" such as the proof of Thm 4.3 (which you may read now...). In any case, if you decide to look at the proof sketch of Thm 2.28, be prepared to only get a vague feeling of how it is proved (see first page of this sketch). That is, try to get the basic idea. It is likely that if you want to get the proof in all its details, this may require much time (possibly consulting other sources...), and I do not recommend spending so much time on it.

SUGGESTED QUIZ FOR SEC 2.4 (not to be submitted):

What are promise problems?
What is the justification for ignoring the promise (in a promise problem) whenever it is polynomial-time recognizable?
What is a candid search problem?
Is it the case that the P-vs-NP Question boils down to determining the time complexity of a single (known) algorithm?
What is the class coNP?
How does NP relate to the class of decision problems that are Cook-reducible to NP?
How does NP relate to the class of decision problems that are Karp-reducible to NP?

TAKE-HOME MESSAGES FOR SEC 2.3:

The notion of a promise problem, and when is it OK to ignore the promise.
The notion of a candid search problem.
Answering the P-vs-NP Question boils down to analyzing the time complexity of a single algorithm (e.g., the optimal algorithm for SAT (Thm 2.33)).
The class coNP.

NOTES FOR 5TH MEETING

Topic: P/poly and the Polynomial-time Hierarchy (PH).
Estimated difficulty: 2.5 in the scale of 5.

The following reading assignment consists of three small parts.

The FIRST part (Sec. 3.1) presents two alternative definition of the class P/poly (via poly-size circuits and via "machines that take advice"), proves their equivalence (Thm 3.6), and shows that P/poly (and in fact even P/1) contains P as well as some undecidable sets (Thm 3.7). The formalism of "machines that take advice" offer a way of measuring the amount of non-uniformity.

The SECOND part (Sec. 3.2.1) presents a definition of the "Polynomial-time Hierarchy" (PH) and its various levels. The main definition is in terms of "quantifier alternation", but an alternative using "non-deterministic poly-time oracle machines" is also presented (in optional reading -- see Sec 3.2.2).

The THIRD (and last) part (Sec 3.2.3) links the two first parts. Specifically, Thm 3.12 (known as "Karp-Lipton Thm"), states that if NP is in P/poly then PH collapses to its 2nd level. The proof (sketched next) has inspired many follow-ups and is important to internalize.

The initial observation is that it suffices to prove that Pi_2=Sigma_2
(and in fact this is the formulation used in Thm 3.12).
Thus, starting with $S\in\Pi_2$ and using the hypothesis
(i.e., "NP in P/poly"), we shall show that $S\in\Sigma_2$.
The second observation is that $S=\{x:\forall y, (x,y)\in S'\}$,
where $S'$ is an NP-set (and $|y|=\poly(|x|)$).
Now comes the main step of the proof:
It is that $x$ is in $S$ if and only if there exists
a polynomial-size circuit $C$ that "decides $S'$ correctly
on *all* inputs of length $m=\poly(|x|)$" and $C(x,y)=1$
for *all* strings $y$ of length $m-|x|$. Note that the condition
has the form "there exists such that for all" (i.e., a Sigma_2-form),
but the issue is whether the 1st condition (i.e., the one in quotes)
is poly-time decidable. The traditional proof appears in Exer 3.11, 
but we prefer the following alternative.

First note that since NP is in P/poly, 
it follows that PC is in the serach version of P/poly
(because PC is Cook-reducible to NP). 
Now, letting $R'$ be the witness relation of $S'$
(i.e., $S' = \{(x,y):\exists z ((x,y),z)\in R'$), 
we have (schematically) $x\in S$ iff $\forall y \exists z ((x,y),z)\in R'$.
Now, *if* the circuit $C$ solves the search problem $R'$,
then we have $x\in S$ iff $\forall y ((x,y),C(x,y))\in R'$
(because $(x,y)\in S'$ iff $((x,y),C(x,y)) \in R'$).
Furthermore, and this is a crucial observation, if $x\not\in S$,
then no matter what $C'$ you take, there exists $y$ s.t. $C(x,y)$
is not a solution to $(x,y)$ wrt $R'$ (because no solutions to $(x,y)$ exist).
Hence (ignoring details (*)), $x\in S$
iff $\exists C\forall y ((x,y),C(x,y)) \in R'$
(which can be written as $\exists C\forall y \Phi(x,C,y)$,
where $\Phi$ is a condition that can be evaluated in polynomial-time).

*) ignoring details such as the fact that we are quantifying 
over strings of adequate length.

Digest: Note that in the foregoing (alternative) proof,
we do not use a circuit $C$ for the decision problem 
but rather for the search problem.
Consequently, in this proof, we do not verify explicitly that 
the circuit $C$ correctly solves the decision (or search) problem.
Such a verification is implicit in the Cook-reduction of PC to NP
and is explicit in the alternative proof presented in Exer 3.11.
(If fact, we only verify that answers to Yes-instances are correct.)

Indeed, the foregoing proof is the sole reason for my estimate that this meeting is significantly more difficult than all prior ones.

READING ASSIGNMENT: Read almost all of Chapter 3 (on the classes P/poly and PH). You may skip Sec 3.2.2 (PH defined by ND poly-time oracle machines).

SUGGESTED EXERCISES FOR Chap 3 (not to be submitted):

Exer 3.2: sparse sets and P/poly; this exercise helps verify that you understand the def of P/poly.
Exer 3.3: "advice hierarchy"
Exer 3.4: Sigma2 contains all that is Cook-reducible to NP. See also Exer 3.5.
Exer 3.6 and 3.9: easy quiz re PH.

All these exercises are supposed to be easy.

SUGGESTED *OPTIONAL* READING:

Read also Sec 3.2.2. Note the warning on the "non-generic" meaning of the notation ${C_1}^{C_2}$ (bottom of page 118).
Exercise 3.11.

SUGGESTED QUIZ (which is actually a digest) FOR CHAPTER 3
Since the answers are not explicit in the text, they are provided HERE.

Assuming that $C$ is a complexity class and $S$ is a set of strings, when does the notation $C^S$ makes sense?
Regarding the proof of Theorem 3.12, what features of NP and PC are used to show that if NP is in P/poly, then PC has polynomial-size circuits?
Regarding the proof of Theorem 3.12, what feature of the ternary relation $R$ is used when showing that for every $x$ it holds that for all y exists z s.t. $(x,y,z)\in R$ if and only if there exists a polynomial-size circuit $C$ such that for all y it holds that $C(x,y)=1$? (Note: the quantifications over y and z refer to all strings of $\poly(|x|)$-length.)

TAKE-HOME MESSAGES FOR CHAP 3:

The class P/poly and its two equivalent definitions.
The Karp-Liton Theorem (Thm 3.12) and its proof.

NOTES FOR 6TH MEETING

Topic: Do more resources yield more power?
Estimated difficulty: 1 in the scale of 5.

In the context of non-uniform computation the answer (to the question in the chapter's title) is a definite YES: Sec 4.1 asserts that $P/(\ell+\omega(1))$ strictly contains $P/\ell$, provided that $\log_2 \ell(n) \leq n-1$.
[Actually, a stronger statement holds -- see corrections on the web-site of the book (or text): It holds that $P/(\ell-1)$ strictly contains $P/\ell$, provided that $\log_2 \ell(n) \leq n$.]

In the context of uniform computation, the answer is YES *provided* that these resources are "self measurable" (i.e., measured by an algorithm that does not use more than the measured amount of resources). Intuitively, if a resource is not self-measurable, then it cannot be used "safely" (i.e., you cannot use the specified amount of resources without risking exceeding it).

The "natural" results asserting that more times adds power appear in Sec 4.2.1.1, whereas the cases in which this does not hold are presented in Sec 4.2.2. Indeed, Hierarchy Theorems refer to the "natural" case in which more resources provide more power (or more ability; that is, ability to perform more tasks (i.e., solve more problems)). In contrast, Gap Theorems assert that under some weird circumstances, resources does not provide more power; this is viewed as a "gap" in the (computing) power as a function of resources (where within the "gapped" region, power does not increase).

Following is an overview of the basic definitions.

Def (for any reasonable and general model of computation): we denote by $DTIME(t)$ the class of sets that can de decided within time complexity $t$ (in this model, which "should" appear as subscript in DTIME, if we are pedantic).

Recall that the Cobham-Edmonds thesis assertts that, upto a polynomial composition, the class $DTIME(t)$ is the same for all reasonable models. Note that the exact classes (without the aforementioned slackness) are not model independent...

Def (time constructability): $f:\N\to\N$ is *time constructible* (in the model) if there exists an algorithm that on input $n$ computes $t(n)$ within $t(n)$ steps.

EXER: For every algorithm $A$, the function f(n) that equals the number of steps of A on input $1^n$ is time constructible (by running $A$ itself (after mapping $n$ to $1^n$), where the latter overhead can be compensated by linear speed-up).

Following is a direct proof for Cor 4.4, which asserts a time hierarchy for any reasonable and general model of computation. In contrast, in the textbook Cor 4.3 is deduces from Thm 4.4, which asserts a tighter hierarchy for a specific model (i.e., two-tape Turing machines).

Cor 4.4: For every reasonable and general model of computation, there exists a polynomial $p$ such that for every time constructible $t_1$ (that is at least linear) it holds that DTIME(t_1) is strictly contained in $DTIME(p\circ t_1)$.

PROOF: The idea is that in such a model, $t_1$ steps of any machine $M$ 
can be emulated by less than $p \circ t_1 /2$ steps of an emulating machine.

Fixing $t_1$, for each machine $M$, we define a "truncated emulation" $M'$
such that $M'(x)=1$ iff $M(x)$ halts within $t_1(|x|)$ steps with output 1.
Otherwise, $M'(x)=0$ (e.g., if $M(x)$ does not halt within $t_1(|x|)$ steps).
By the above, $M'$ has running time at most $p\circ t_1/2 +O(1)$.

For any (simple) mapping of machines $M$ to inputs $x_M$,
define $f(x_M) = 1-M'(x_M)$. Then, $f$ is in DTIME(p \circ t_1)$,
but cannot be in DTIME(t_1)$ as otherwise we derive a contradiction
by considering the machine $M_f$ that computes $f$ in time $t_1$
(and, specifically, considering what this machine does 
on the corresponding input $x_{M_f}$).  QED


Digest: The key point is the definition of a truncated
emulation of a machine, since this allows us to consider
all possible machines and avoid the impossible task
of enumerating all machines that have time complexity at least $t_1$.
(N.B.: Enumerating all machines is feasible, by using the one-to-one
corresponding between strings and machines, but enumerating all
machines that have a specific time complexity is impossible.)

READING ASSIGNMENT: Read almost all of Chapter 4, skipping Sec 4.2.1.3.

Sec 4.1 (non-uniform hierarchy) is a very easy read.
Sec 4.2.0-4.2.1.2 (Time hierarchy) is the core of the current assignment.
Sec 4.2.2 (Gaps and Speedups) is more important for the stated results, not their proofs. Make sure you understand what Thms 4.7 and 4.8 say.
Sec 4.2.3 merely discusses analogues for space complexity. Note that emulation is "easier" wrt *space* (compared to *time*).

SUGGESTED OPTIONAL READING

Sec 4.2.1.3 (Thm 4.6: NTIME hierarchy) is recommended as a good warm-up towards the proof of Thm 2.28 ("NPI").
At this point, one can try to read the proof of Thm 2.28 (i.e., on NP-sets that are neither in P nor NP-complete). Although it does *not* build on the proof of Thm 4.6, it uses the idea of "delayed diagonalization" which is easier to grasp in the context of the proof of Thm 4.6.

The proof of Thm 2.28 is quite sketchy. It may give you an idea of what is going on, but may fall short of convincing you due to its sketchy style and lack of some details. Hence, if you are interested in seeing a full and rigorous proof, you will have to look at other sources. (I don't recall seeing a source that I can wholeheartedly recommend.)

SUGGESTED EXERCISES FOR Chap 4 (not to be submitted):

Exer 4.1 and 4.2 refer to the non-uniform model.
Exer 4.7 (re "an efficient counter") may be of general interest.
Exer 4.8 and 4.9: A set in $E - P$, and EXP-completeness.

SUGGESTED QUIZ (which is actually a digest) FOR CHAPTER 4
Since the answers are not explicit in the text, they are provided HERE.

Fixing a specific model of computation and a definition of time complexity for it, what enables showing that DTIME(t1) is strictly contained in DTIME(t2)? That is, what should be the relation between t1 and t2 and aspects of the model for such a results to hold?
Similarly, what enables showing that DTIME(t1) = DTIME(t2), when t1 is significantly smaller that t2?
Ditto for space.

TAKE-HOME MESSAGE FOR CHAP 4:

Typically, more resources (e.g., time or space) yield more power (e.g., Cor 4.4); but, this does not hold if the resource bound is not "self measurable" (e.g., time-constructible) as in Thm 4.7.

NOTES FOR 7TH MEETING

Topic: Space complexity - Part 1 (basics).
Estimated difficulty: 1 in the scale of 5.

Space complexity is of intrinsic interest, but is also interesting as offering a perspective (and sometimes a contrast) to issues w.r.t time complexity.

When studying space complexity, we wish to account for the storage ("space") used for the actual computation as opposed to the storage taken by the input and output. This is crucial since we are often interested in sub-linear space complexity, and conventions should be introduced in order to distinguish space used for ("active") computation from space used for ("passive") storage of the input and output. This is done by using separate storage for each of the three roles. That is, we use a read only input device, a write only output device, and a storage device for performing the actual computation (i.e., for storing the intermediate results of the computation), called the work space device. Only the work space device is charged for space consumed by computation, and the read-only and write-only conventions prevent abuse of the other (i.e., input and output) devices for conducting computation.

While one may conjecture that the minimal amount of (super-constant) storage that can be useful for computation is logarithmic (since it seems that one needs to "compute" the input length in order to allocate the space consumption), this turns out wrong (i.e., double-logarithmic is the correct lower bound for computations that cannot be performed in constant amount of space [See Section 5.1.2]). Nevertheless, we shall only consider space complexity that is at least logarithmic (in the input length).

Some proofs in the text (and many in other sources) refer to the "instantaneous configuration" (of a machine during a computation), which is often decoupled into a fixed part and a variable part, where the partition is application-dependent. For example, in the proof of Thm 5.3, the input is fixed, whereas the variable part consists of the location of the head on it as well as the contents of the work space (and the location on it). In such cases, one often ignores the fixed part, and refers to the variable part as the instantaneous configuration.

READING ASSIGNMENT:

Sec 5.1 ("space complexity -- general issues"), but *skip* most of Sec 5.1.3.3 ("subtleties re space-bounded reductions") by stopping at the Teaching Note on p. 150 (i.e., start skipping there). Do read Sec 5.1.4 and also the laconic sections 5.1.3.4-5.1.3.6.
Sec 5.2.0-5.2.3 (see, in particular, Thm 5.4 and 5.5).

SUGGESTED *OPTIONAL* READING:

Look at Exer 5.3 and 5.4 (on double-logarithmic space complexity), but do *not* exhaust yourself with attempts to provide all low-level details for these exercises.
Read all of Sec 5.1.3.3 (i.e., including the part skipped in the main assignment).

SUGGESTED EXERCISES (not to be submitted):

Exer 5.1, 5.2 and 5.5: on some "w.l.o.g" -- simple modifications
Exer 5.6: some computations in log-space
Exer 5.11 and 5.12: using the "emulative composition lemma" (Lem 5.2)

SUGGESTED QUIZ (for Sec 5.1)

What are the most important conventions regarding space complexity?
Suppose that a graph theoretic problem is solved in space $s$, when the graph is presented by its adjacency matrix. What is the space complexity of solving it when the graph is presented by incidence lists? [Hint: Lem 5.2.]
Assuming that a problem can be solved in space $s$, what can be said of its time complexity?

TAKE-HOME MESSAGES FOR SEC 5.1:

The time complexity is at most exponential in the space complexity [Thm 5.3].
Jumping ahead, we warn that this holds for deterministic (and non-deterministic) computations, but not for randomized ones [Sec 6.1.5].
The "emulative composition lemma" (Lem 5.2) is useful, especially when composing a constant number of computations.

NOTES FOR 8TH MEETING

Topic: Space complexity - Part 2 (UCONN in L).
Estimated difficulty: 5 in the scale of 5. Highly recommended to ask me to attend the meeting.

[June 2024: Following several feedback, I currently recommend to reverse the order of the 8th and 9th meeting; that is, go over the NL section first, and return to UCONN-in-L afterwards.]

[Class of 2018/19 recommendation: Split this meeting into two. One of expander graphs and one on the actual UCONN algorithm. (Oded: If you do so, then you may want to read all of Apdx E.2 for wider perspective and more comfort with expander graphs).]

[NOTE: I was debating with myself whether to assign you first the NL=coNL result (incl reading about NSPACE) or to start with the log-space algorithm for deciding undirected connectivity of graphs (UCONN). The latter algorithm (especially its details) is far more complex, but the setting (unlike NSPACE) is more familiar and less problematic (from a conceptual point of view). So I still tend to suggest starting with UCONN, but you should feel free to take the alternative route.]

Warning: This reading assignment is significantly more difficult than any of the material covered so far; it is focused on a rather sophisticated algorithm and analysis.

The reading sequence that I propose starts with the overview of the UCONN algorithm (Sec 5.2.4.0). The key observation is that if the input graph were of (1) bounded degree and (2) logarithmic diameter, then a simple log-space algorithm can solve the problem by "enumerating" all possible paths. But, of course, the graph may not satisfy these conditions and so the idea is to transform it to a graph that does satisfy these condition while preserving the relevant connectivity property. (Needless to say, this transformation should be performed in log-space.)

As a warm-up, note that we can easily transform the graph into one that supports one of the properties but not the other. (1) We can make the graph 3-regular by replacing each vertex $v$ of degree $d(v)$ by a $d(v)$-cycle and connect edges between "representatives". (2) We can reduce the diameter of a graph by "raising it to a power". But we wish to transform (each connected component of) the graph into one that has both bounded degree and logarithmic diameter.

Sec 5.2.4.1 presents the foregoing transformation at an abstract level (i.e., avoiding the question of effecting it in log-space). You will need to read first about expander graphs and about the zig-zag product (Apdx E.2.1.1 and E.2.2.2 will do!). Once you read these, read Sec 5.2.4.1 (again or for the 1st time).

Basically, what is done is interleaving steps of reducing the diameter (or actually increasing expansion) with steps of decreasing the degree. The expansion is increased by raising the graph to a constant power, and the degree is reduced by the zig-zag product. We take a logarithmic number of such iterations, to increase expansion from (the trivial) 1/poly lower bound to a constant.

Surely, implementing the above sequence of transformations in the straightforward way will not do (since it will yield a space bound of log-square). The key ideas presented in Sec 5.2.4.2 are (i) implementing an "oracle access to the current graph" rather then actually printing it. (ii) using a "global" accounting for the space used by recursive calls of a subroutine (see the notion of "global-tape oracle machine"). The relevant computation in that model are illustrated HERE.

ADD'L INTUITION (to be read along with the actual text in the book)

Recall that the algorithm is motivated by considering the diameter of the graph, but the actual analysis cannot be carried out in these terms. This is the case because the diameter-reduction effect of the powering is undone (as far as a local analysis goes) by the diameter-increase caused by the degree reduction (using the zig-zag product). In contrast, the parameter that we shall analyzed is harmed less by the zig-zag product.

It is common in mathematics (incl in TCS) to use an invariance and/or a parameter that is stronger than the one that we actually want. Hence, instead of diameter, we may consider the "mixing time" (of the graph), defined as the minimum number of random steps you need to take (when starting from any fixed vertex) till you get to a distribution that is close to the uniform (say get to a distribution in which each vertex is reached with probability $(1\pm0.1)/n$). So "mixing time" can be viewed as a strengthening of diameter; in particular, the diameter is upper bounded by the mixing time... In Section 6.1.5.2, we shall see than any $n$-vertex graph has mixing time that is at most $O(n^3)$.

So it would be great if we could directly prove that the mixing time shrinks by a constant factor in each iteration of the foregoing procedure. The mixing time is closely related to the "spectral gap" (or "gap"), which is the gap between the 1st eigenvalue (i.e. 1) and the 2nd eigenvalue of a corresponding matrix. Roughly speaking, the 2nd eigenvalue (i.e., 1 minus the gap) equals the rate of convergence of a random walk (to the stationary distribution), and the mixing time is inversely proportional to the gap, where the factor of proportion is logarithmic (i.e., the mixing time is $O(\log_{1-gap}n) = O(log n)/gap$). Hence, by showing that the spectral gap increases by a factor of two in each iteration, we would be showing that mixing time shrinks in that manner.

Lastly, here is some intuition regarding expander graphs, and in particular their algebraic definition. The key points to note are the following:

For convenience, for a $d$-regular $n$-vertex graph $G$, consider the *normalized* adjacency matrix of $G$ (i.e., the adjacency matrix divided by $d$), denoted $A_G$. This is a stochastic matrix; that is, each row sums to 1.
Applying $A_G$ to a probability vector corresponds to taking a random step on the graph; that is, if the vector $p$ represents the current distribution on the vertices of $G$, then $A_G p$ represents the distribution after taking an additional random step.
The uniform vector $(1,....,1)$ (or $(1/n,....,1/n)$) is an eigenvector of $A_G$ with eigenvalue 1.
Each vector orthogonal to $e_1=(1/n,....,1/n)$ has sum 0.
Writing each vector as a linear combination of the eigenvectors (or an orthogonal basis of these), each probability vector has the form $e_1$ plus a vector in the orthogonal space.
Thus, applying $A_G$ to a probability vector yields a probability vector that is closer to $e_1$, because the component in direction $e_1$ remains intact whereas the orthogonal component shrinks by a factor of $\lambda_G$ (= the 2nd eigenvalue of $A_G$ in absolute value).

Thus, if you start from any distribution over $V(G)$ and take a random step, then you end up in a distribution closer to uniform over $V(G)$. In particular, if you start from a uniform distribution over a set $A$ (of less than half of $V(G)$), then you get to a distribution that has a significantly larger support. This is the "expansion" property (*). Also, if you take $O(\log_{1/\lambda_G} n) = O(\log n)/(1-\lambda_G)$ random steps from any start vertex, then you reach almost uniform distribution (over $V(G)$). Hence, $lambda_G$ is the rate of convergence of a random walk on $G$ (to the stationary distribution).

*) The fact that eigenvalue gap implies expansion is closely related to the Expander Mixing Lemma (i.e., Lemma E.8, page 558). Note that in the special case of $B=V-A$, we can use $\sqrt{|A|\cdot|B|} \leq N/2$, and so the number of edges going out of $A$ is at least $((1-\rho(A))\cdot d - \lambda/2)\cdot|A|$. It follows that for $|A| \leq N/2$, it holds that $|\Gamma(A)-A| \geq ((1/2)-(\lambda/2d))\cdot|A|$.

READING ASSIGNMENT (in suggested order):

Sec 5.2.4.0 (overview of the UCONN algorithm).
Apdx E.2.1.1 (two equivalent defs of expanders).
Apdx E.2.2.2 (the ZigZag product).
Sec 5.2.4.1 (the transformation to an expander)
Sec 5.2.4.2 (a log-space implementation)

HOMEWORK ASSIGNMENT (to be submitted (details to be determined)):

Exercises 5.13, 5.15 and 5.16. Please do not read the guidelines (which are hints) unless you feel you need them.
Optional (not for submission): Exer 5.17.

SUGGESTED *OPTIONAL* READING:

Read the entire section on expander graphs: All of Apdx E.2.
Think of Exer 5.17.
Advanced: the analysis of the Zig-Zag product.

SUGGESTED QUIZ (for UCONN in L)
It seems quite hard to design a meaningful quiz for this meeting. So I'll present a pseudo-quiz, which is a kind of digest.

Describe a log-space algorithm for counting the number of connected components in bounded-degree graphs of logarithmic diameter. Actually, this may require a hint: Consider computing the function that given a vertex name computes the first (in lex-order) name of the vetex that resides in the same connected component.
Given a log-space transformation of any graph into a graph of bounded-degree graphs of logarithmic diameter such that the transformation preserves the number of connected components in the graph, describe a log-space algorithm for counting the number of connected components in any graph.
Describe a transformation as in the previous item, with the exception of using a polynomial-time algorithm that uses a poly-logarithmic amount of space. How much space did you use?
Describe, in high level, the nature of the observations and conventions that allow to implement the foregoing transformation in logarithmic space?

TAKE-HOME MESSAGES FOR SEC 5.2:

UCONN is in L; that is, there is a log-sapce algorithm for deciding whether a given graph is connected (Thm 5.6).
The second eigenvalue of a stochastic matrix that corresponds to a regular graph describes the rate in which a random walk on the graph converges to the uniform distribution.
The degree of vertices in the graph can be reduced by replacing vertices by gadgets and connecting incident edges to different vertices in these gadgets. Using connected gadgets preserves the connectivity of the original graph, whereas using gadgets that are good expanders almost preserves the expansion of the original graph.

NOTES FOR 9TH MEETING

Topic: Space complexity - Part 3 (NL).
Estimated difficulty: 1.5 in the scale of 5.

The reading assignment has two parts: (1) a mainly conceptual part that discusses on-line vs off-line non-determinism, and (2) a technical part about NL and Directed Connectivity (st-CONN).

Re (1). As explained in Sec 5.3.1, these two non-deterministic models are significantly different in the context of space complexity, since (intuitively) an off-line non-deterministic device can be used as an unaccounted work space. For this reason, the on-line model is the model of choice (but results for the off-line model follow). Specifically, we focus on the class of sets having log-space machines in the on-line non-deterministic, denoted NL.

Re (2). In a sense, st-CONN "encodes" all of NL; it is definitely NL-complete under log-space reductions (Thm 5.11). It is easy to see that st-CONN can be solve in space log-squared (and in general $NSPACE(s)$ is contained in $DSPACE(s^2)$ (Thm 5.12)). Finally, there is the celebrated theorem NL=coNL (or, equivalently, directed st-non-connectivity is in NL), which is the most challenging part in the current reading assignment (see Sec 5.3.2.3).

The key idea (in the proof that NL=coNL) is an "NL machine" for enumerating all vertices that are at a given distance from a given vertex in a given graph, where an non-deterministic machine is said to compute a function if (i) it always outputs either the correct value or an error message (*), and (ii) it sometimes outputs the correct value (rather than an error message). Now, let $R_G(v,k)$ denote the set of vertices that are at distance at most $k$ from $v$ in the graph $G$. Then, it holds that $R_G(v,k)$ equals the set of vertices that are either in $R_G(v,k-1)$ or are adjacent to a vertex in $R_G(v,k-1)$. Thus, if you can (non-deterministically) enumerate $R_G(v,k-1)$, then you can (non-deterministically) enumerate $R_G(v,k)$. This is done by going over all $V(G)$, and checking for each target vertex whether it is in $R_G(v,k)$ by (re-)enumerating $R_G(v,k-1)$ on-the-fly (for each target vertex), and testing whether the target vertex should be included by virtue of either being itself in $R_G(v,k-1)$ or having a neighbor in $R_G(v,k-1)$.
*) When saying that the machine may output an error message, we do not insist on a single error symbol but rather allow it to output any message that ends with a special error symbol. This more liberal notion allows the machine to start producing bits of the output and declare the output as erroneous (and halt) at any point.

READING ASSIGNMENT: Basically read all of Sec 5.3.

Sec 5.3.1: two model of non-deterministic space complexity.
Sec 5.3.2: NL and Directed Connectivity (denoted st-CONN), including the completeness of st-CONN, NSPACE vs DSPACE, and NL=coNL (or solving st-non-connectivity in NL).
Sec 5.3.3: a "philosophical" discussion.

SUGGESTED *OPTIONAL* READING AND THINKING:
Read and think of Exer 5.18 relating the on-line and off-line models.

SUGGESTED *OPTIONAL* EXERCISES (do not submit)

Exer 5.19-5.21: about NL and directed connectivity
In continuation to Exer 5.20 (but actually independent of it), present a deterministic algorithm of space complexity log squared that finds (i.e., outputs) a shortest path in an input graph.
(You can derive it by combing Exer 5.20 with Thm 5.12, but I suggest to figure out a direct description in standard algorithmic terms (rather than present reductions etc).)
Exer 5.25: On the notion of non-deterministic computation of functions.
Exer 5.26: On the algorithmic graph theoretic meaning of NL=coNL.

SUGGESTED QUIZ (which is actually a digest for NL=coNL)

What does it mean for a non-deterministic machine to compute a function?
In the case the function is Boolean, how is this notion different from the notion of non-deterministically accepting a set?
The following is an abstraction and generalization of the proof that an "NL machine" can enumerating all vertices that are reachable from a given vertex in a given graph. (Hint: In the special case, $R(G,s,i)=R_G(s,i)$.)
Suppose that $R(x) \subseteq\{0,1\}^\ell$, where $\ell=O(\log|x|)$, and that $R=\{(x,y):y\in R(x)\}$.
1. Show that if $R\in\NL$, then there exists a non-deterministic log-space machine that on input $x$ and $|R(x)|$, outputs $R(x)$ itself.
  (Use the liberal model discussed in (*) above.)
2. Suppose that $f$ is a length-preserving function that is computable by a non-deterministic log-space machine, and that there exists a non-deterministic log-space machine that given $x$ and $R(x)$ computes $R(f(x))$. Assuming that $R\in\NL$ and using Item 1, show that there exists non-deterministic log-space machine that given $x$ and $R(x)$ computes $R(f^{|x|}(x))$, where $f^i(x)=f^{i-1}(f(x))$ and $f^0(x)=x$. (Write down a detailed description rather being content with "hand-waving" that you can do it.) Ditto for $f^{p(|x|)}$ for any other fixed polynomial $p$.
  Why is the use of Item 1 essential here? (See hint below.)
Repeat both steps when replacing non-determinism by determinism (and NL by L).
Show that coNL is in NL by using the framework of the previous item. (See hint below.)

Hint for item 2 in the 2nd bullet: We proceed in iterations so that in the $i$-th iteration we produce and record some function $F$ of $x$ and $i$, where in our case $F(x,i)=|R(f^i(x))|$. We can store both $i$ and $F(x,i)$ on our work space, provided that both have length logarithmic in $|x|$, and we should be able to compute $F(x,i+1)$ from $x,i$ and $F(x,i)$.
Hint for the 3rd bullet: Let $x=(G,s,i)$, where $G$ is a directed graph, $s$ is a vertex in $G$, and $i$ is an integer. Consider the problem of finding the set of vertices that are reachable from $s$ within $i$ steps (and use $f(G,s,i)=(G,s,i+1)$).

TAKE-HOME MESSAGES FOR SEC 5.3:

st-CONN is in coNL; that is, st-non-connectivity can be decided by a log-space non-deterministic machine (Thm 5.14). Equivalently, st-non-connectivity can be reduced in log-space to st-connectivity.
The notion of non-deterministically computing a function (Def 5.13).

NOTES FOR 10TH MEETING

Topic: Space complexity (last part) and randomized complexity classes (introduction).
Estimated difficulty: 1 in the scale of 5.

The next reading assignment consists of two unrelated topics. The 1st topic is the last topic that we shall study regarding space complexity, and the 2nd topic is actually a set of basic issues regarding randomized complexity classes.

The 1st topic is showing that QBF is complete for PSPACE under Karp reductions. We shall show how the acceptance condition of any polynomial-space computation can be encoded by a Quantified Boolean Formula of polynomial length. The underlying idea resembles/mimics the proof that NSPACE(s) is contained in DSPACE(s^2); essentially, we shall "re-use" formulas in a way analogous to the re-usage of space.

The 2nd topic enters the study of the complexity of randomized algorithms. We start with introducing the notion of randomized algorithms and various issues and conventions related to them. We will discuss the equivalence of the on-line and off-line formulations, where the on-line formulation is more natural and the off-line formulation is particularly useful for the analysis of randomized algorithms. We shall focus on the type and magnitude of the failure probability in randomized algorithms. Considering the class BPP capturing two-sided error PPT (probabilistic polynomial time) decision procedures, we shall prove that BPP is (strictly) contained in P/poly, and postpone to later showing that BPP is in PH (see Sec 6.1.3 (to be covered in the next meeting)).

In addition, please read Appendix D.1, which presents probabilistic preliminaries that we need for this chapter (on randomized algorithms). Sec D.1.2.4 is not important at this point, but it is very important in many other settings, and so I strongly recommend reading it too.

READING ASSIGNMENT

Sec 5.4 (on PSPACE).
Sec 6.1.0-6.1.2 ("Introduction to PPT" and BPP). Sec 6.1.2.2 is an illustration -- please don't get entangled in it, and do take the two stated facts as given (i.e., don't worry about proving them, although this is not a "big deal").
Apdx D.1 (probabilistic preliminaries).

SUGGESTED EXERCISES (not to be submitted):

Exer 5.30: The QBF game (for Sec 5.4)
Exer 6.2: about randomized reductions, very easy
Exer 6.4: about error reduction, easy but very important

SUGGESTED QUIZ FOR THE BASICS OF PROBABILISTIC POLYNOMIAL TIME

What is the difference between BPP, RP, and ZPP.
When viewing these classes as natural classes of PPT algorithms, why is it important to consider failure probability that is bounded-away from the trivial upper bound (rather than only strictly smaller than it)?
What is the trivial upper bound in each of the cases (i.e., of BPP, RP, and ZPP)?
What does bounded away mean, and why is this specific definition used?
Similarly, why is it important to allow a failure probability bound that is significantly larger than zero (rather than only strictly larger than zero)?
(Clarification: Assuming that the algorithm tosses $m$ coins, what happens if we require it to fail with probability at most $2^{-m}$ (or $\poly(m)\cdot2^{-m}$)?)

TAKE-HOME MESSAGES FOR THE CURRENT MEETING:

QBF is complete for PSAPCE (Thm 5.15).
Error reduction (see top of page 190).
BPP is in P/poly (Thm 6.3).

NOTES FOR 11TH MEETING

Topic: randomized complexity classes.
Estimated difficulty: 1.5 in the scale of 5.

The next reading assignment consists of several issues relating to PPT. The highlights include a proof that BPP is reducible via a one-sided error randomized Karp reduction to coRP and a discussion of RL (randomized log-space).

The aforementioned randomized reduction (presented in Sec 6.1.3.2) is typically use to establish a weaker result, stating that BPP is in PH (or, more specifically, that BPP is in Sigma2). In fact, it may be pedagogically better to present the underlying ideas in this weaker terms. Actually, I currently tend to present them in terms of MA2 vs MA1, where MA denotes a generic randomized version of NP (recall Def 2.5), and MA2 (resp., MA1) are two-sided error (resp., one sided-error) version of it. Specifically, $S\in MA2$ (resp., $S\in MA1$) if there exists a probabilistic polynomial-time machine $V$ such that

If $x\in S$, then there exists $y$ of $\poly(|x|)$ length such that $Prob[V(x,y)=1] \geq 2/3$ (resp., equals 1);
If $x\not\in S$, then for every $y$ it holds that $Prob[V(x,y)=1] \leq 1/3$ (resp., $\leq 1/2$).

(Note that, like for BPP and RP, error-reduction applies here too.)
Obviously, both BPP and MA1 are contained in MA2, and NP is contained in MA1; but recall that it is not know whether BPP is in NP. The proof to be presented next shows that MA2 is contained in MA1; that is, we can eliminate the error in the completeness condition. The proof, due to Lautemann, is based on providing some freedom to the verification process; specifically, its randomness may be shifted in one of polynomially many options provided as part of the proof. A sketch of this proof follow.

 
Let $V$ be as in the definition of MA2.
Applying error-reduction, we may assume that
for every $x\not\in S$ and $y$ 
it holds that $Prob_{r\in\{0,1\}^m}[V_r(x,y)=1] \leq 1/3m$,
where $r$ denotes the randomness of $V$.
Now, the new prover will augment $y$ with a sequence of offsets $s_1,...,s_m$
and the new verifier $V'$ will accept when tossing coins $r$
if and only if for some $i\in[m]$ it holds that $V_{r\xor s_i}(x,y)=1$. 
The point is that for every $x\in S$ and $y$
such that $Prob[V(x,y)=1] \geq 2/3$,
there exists $s=(s_1,...,s_m)$ such that $Prob[V'_r(x,(y,s))=1] = 1$.  
(Since, for every $x\in S$ and $y$, and every $r\in\{0,1\}^m$, 
it holds that $Prob_s[V'_{r}(x,(y,s))=0] \leq (1/3m)^m \ll 2^{-m}$;
indeed, this shows that almost all choices of $s$ are good.) 
On the other hand, for every $x\not\in S$ and every $y,s$ 
it holds that $Prob[V'_r(x,(y,s))=1] \leq m/3m$
(since for every $i\in[m]$ and every $s_i$ 
it holds that $Prob_r[V_{r\xor s_i}(x,y)=1] \leq 1/3m$).

Note that the foregoing ideas are presented in the book in a more cumbersome manner, since the setting there is one of a randomized reduction to a problem in coRP.
(Indeed, the term offset may be better than the term shift, which is used in the book.)

Turning to Section 6.1.5, which refers to RL (randomized log-space), I stress the importance of restricting computation to polynomial-time (which was the case "automatically" in the cases of L and NL). Without this restriction, the "badly defined RL" contains all of NL; the proof of the latter fact contains an interesting idea: a "randomized approximate step-counter" (see Exer 6.18), which allows counting $Omega(T)$ steps at the cost of $\log\log T$ space).

READING ASSIGNMENT: The rest of Section 6.1. Specifically:

Sec 6.1.3: One-Sided Error and the proof that BPP is reducible with a one-sided error Karp reduction to coRP.
Sec 6.1.4: Zero-error PPT (ZPP) and "ZPP = RP intersection coRP".
Sec 6.1.5: RL, issues in its definition, and UCONN in RL.

SUGGESTED EXERCISES (not to be submitted):

Exer 6.4-6.6: about error reduction, easy but very important. (The point is merely defining the random variables on which the probabilistic inequalities are to be applied. You should be in a state in which you consider such an exercises trivial.)
Exer 6.7: about an alternative definition of ZPP, very important!
Exercises 6.10 and 6.11: Add'l details for the illustrative paragraph in the proof of Thm 6.9. Exec 6.10 is good to know in general!
Exer 6.14-6.15: Exer 6.14 is easy, but Exer 6.15 is tedious. They show completeness and hierarchies for a promise problem version of BPP.

SUGGESTED QUIZ

Define log-space analogues of BPP and ZPP, to be denoted BPL and ZPL respectively.
Show that ZPL equals RL intersection coRL.
Can the ideas that underly the proof that BPP is randomly reducible to RP by a one-sided error reduction be applied to BPL? (See hint below.)

(Hint: For $m=|r|$, suppose that $g_1(r),...,g_{m}(r)$ are each computable in log-space when given on-line access to $r$, does it follow that $\vee_{i\in[m]}g_i(r)$ is computable in log-space given on-line access to $r$?)

TAKE-HOME MESSAGES FOR THE CURRENT MEETING:

MA2 is contained in MA1 (see above text, which is more direct than Sec 6.1.3.2).
ZPP equals the intersection of RP and coRP (Thm 6.10).
A "randomized approximate step-counter" that allows counting $Omega(T)$ steps at the cost of $\log\log T$ space (see Exer 6.18).

NOTES FOR 12TH MEETING

Topic: Counting classes - Part 1 (exact and approximate).
Estimated difficulty: 1 in the scale of 5.

Regarding exact counting, I wish to highlight the following points

The notion of a counting problem and the counting class #P: For every $R\in\PC$, the (function) class $\#P$ contains the function $f_R$ defined by $f_R(x) = |R(x)|$.
A decision problem that is computationally equivalent to a counting problem: $\{(x,N): f(x)\geq N\}$ for $f$ in $\#P$. Note the use of LEQ/GEQ rather than EQ.
One type of #P-completeness results is obtained by use of parsimonious Karp-reductions (from PC to some R in PC).
- But this type of reduction, which is a special case of a Karp-reduction, implies PC-completeness of the corresponding search problem, and thus is unlikely to be applicable to problems in PF (assuming PF is not contained in PC).
- Still, other reductions (i.e., non-parsimonious ones) yield another type of #P-completeness results, and these may refer to problems in PF. One example is #DNF, another is #PerfectMatching, and one can also show many artificial examples (see Exer 6.22).

Regarding approximate counting, I wish to highlight the following points

On additive and relative notions of approximate counting.
- For any $R\in\PC$ with $R \subseteq \cup_{n}\{0,1\}^{n+m(n)}$, it is easy to approximate $f_R=#R$ up to an additive error of $2^{m(n)}/\poly(n)$ (i.e., this can be done in PPT). In contrast, relative factor approximation is at least as hard as deciding $S_R$ (since it allows to distinguish zero from positive).
- Warning: relative approximation of the number of valid solutions may not be computationally equivalent to relative approximation of number of invalid solutions. In other words, relative approximation of $Q\in[0,1]$ is not necessarily computationally equivalent to relative approximation of $1-Q$. (In contrast, an additive approximation of $Q$ is computationally equivalent to an additive approximation of $1-Q$.)

In contrast to the general case discussed above, in some cases relative approximation can be reduced to additive approximation. Specifically, we shall see a (polynomial-time deterministic) reduction of "relative approximation of #DNF" to "additive approximation of #DNF". A sketch of this reduction follows.

For a DNF $\phi = \vee_{i\in[m]}\phi_i$ over $n$ variables, 
note that  
$$\Prob_x[\phi(x)=1] 
     = \sum_{i=1}^m \Prob_x[\phi_i(x)=1 \wedge (\vee_{j\ls i}\phi_j(x)=0)]
     = \sum_{i=1}^m \Prob_x[\phi_i(x)=1]
                    \cdot\Prob_x[\vee_{j\ls i}\phi_j(x)=0 | \phi_i(x)=1].$$ 
Next, observe that 
(i) $q_i = \Prob_x[\phi_i(x)=1]$ is easily computable; and 
(ii) $p_i = \Prob_x[\vee_{j\ls i}\phi_j(x)=0 | \phi_i(x)=1]$
    can be efficiently approximated up to an *additive* deviation of $\eps$ 
    (w.p. $1-\exp(-t)$, by taking $O(t/\eps^2)$ random samples in $\phi_i^{-1}(1)$).
Lastly, note that $\sum_{i=1}^m (p_i\pm \eps)\cdot q_i$
us bounded by $\sum_{i=1}^m p_i q_i \pm m\eps\cdot Q$,
where $Q=\max_{i}\{q_i\} \leq \Prob_x[\phi(x)=1]$
(since $\Prob_x[\phi(x)=1] \geq \Prob_x[\phi_i(x)=1]$ for any $i$).
It follows that $\sum_{i=1}^m p_i q_i \pm m\eps\cdot Q$
is bounded by $(1\pm m\eps)\cdot \Prob_x[\phi(x)=1]$.

READING ASSIGNMENT

Pages 202-205: Exact counting (not including the proofs of Props 6.21 and 6.22 (in pages 206-211)).
Sec 6.2.2.0: The notion of Approximate counting.
Sec 6.2.2.1: Approximate counting for DNF.

PREPARING FOR THE NEXT READING ASSIGNMENT
(You do not need this for the current meeting, but you'll need it for the next meeting; so if you have time, then you may start looking at the following): Read Appendix D.2 on Hashing Functions.

When reading Apdx D.2.2, you may either take the claims on "faith" or read Sec 8.5.1.1 now.
When reading Apdx D.2.3, take a look (again) at the discussion in Apdx D.1.2.4.
Take home message from the Hashing Lemma (Lem D.4): For a family of pairwise independent hashing functions mapping $n$-bit strings to $m$-bit strings, almost all hashing functions map a sufficienttly large subset almost uniformly on their range, where sufficiently large means somewhat larger than $2^m$.

ADD'L READING (NOT MANDATORY): I do recommend reading the proofs of Props 6.21 and 6.22 (pages 206-211), see more details below. These proofs are independent of one another, and so you may read one but not the other. I'd give priority to the proof of Prop 6.22.

The proof of Prop 6.21 (which spans 4 pages and 3 exercises) is via an ingenious gadget reduction; the main lesson from it is that sometimes one just needs ingenuity. Actually, this refers to the original ("structured") proof; the alternative proof (via Fig 6.3) was found retrospectively by using an automatic search for a solution.
The (1 page) proof of Prop 6.22 is via a couple of reductions, which offer some concrete technical lessons. Specifically, note the benefit of moving from integer arithmetics to modular arithmetic (i.e., avoiding negative integers) and the move from a problem regarding weighted graphs to a related problem regarding unweighted graphs.

SUGGESTED QUIZ (for Sec 6.2.1-6.2.2.1)

What is a parsimonious reduction?
Under what notions of (exact and approximate) counting is the complexity of counting valid solutions related to the complexity of counting invalid solutions?
How is the complexity of exact and approximate counting problems related to the complexity of solving decision problems?

TAKE-HOME MESSAGES FOR THE CURRENT MEETING:

If the decision problem associated with a search problem is hard to solve, then the corresponding counting problem is definitely hard.
The counting versions of all popular NP-complete problems are #P-complete, and hence are NP-hard (see Thm 6.18).
For some search problems that are easy to solve (i.e. are in PF), it seems hard to count the number of solutions; that is $R\in\PF$ does not seem to imply that $\#R$ is easy (see Thm 6.19).
In particular, counting perfect matchings in bipartite graphs is #P-complete (Thm 6.20).
Relative approximation of $Q\in[0,1]$ is not necessarily computationally equivalent to relative approximation of $1-Q$.
The Hashing Lemma (Lem D.4): For a family of pairwise independent hashing functions mapping $n$-bit strings to $m$-bit strings, almost all hashing functions map a sufficiently large subset almost uniformly on their range, where sufficiently large means somewhat larger than $2^m$.

At this point, with the material becoming more complex, I stopped offering quizes.

NOTES FOR 13TH MEETING

Topic: Counting classes - Part 2 (approximation reduces to NP).
Estimated difficulty: 3-4 in the scale of 5. It is mildly recommended to ask me to attend the meeting.

[Class of 2018/19 recommendation: It seems that the difficulty (of 4) is overestimated, and should be reduced to 3, which means that it is possible but not really needed to ask me to attend. (Oded: Actually, I had the same feeling when going over the material for this meeting.)]

The issues that I'd like to highlight include

The Hashing Lemma (Lem D.4) refers to a family $H_n^m$ of pairwise independent hashing functions mapping $n$-bit strings to $m$-bit strings and to a subset $S$ of $\{0,1\}^n$ of size greater than $2^m$ (it holds also for smaller sets, but vacuously so...).
Note that for $|S| > 2^m$, it holds that $\mu = |S|/2^m > 1$, where $\mu$ is the expected number of $S$-elements hashed (by a random $h\in H_n^m$) to a fixed image (i.e., $z\in \{0,1\}^m$). The lemma states that, for every $z$ and any $\eps > 0$, for all but an $1/(\eps^2 \mu)$ fraction of $h\in H_N^m$, it holds that $|\{w\in S: h(w)=z\}| = (1\pm\eps)\cdot\mu$.
[Unfortunately, Apdx D.2 and Sec 6.2 use different notations. In Apdx D.2, $w$ is replaced by $x$ (and $z$ by $y$), whereas in Sec 6.2 $w$ is replaced by $y$.]
Take home message: Almost all hashing functions map a set $S$ almost uniformly on their range, provided that $S$ is sufficiently larger than their range.
THM 6.27 states that for every $R\in\PC$, a relative approximation of $\#R$ can be obtained by a PPT reduction to NP (i.e., a PPT orcale machine with oracle to a set in NP).
The proof uses "random sieving" of the set $R(x)$, when $x$ is the given input. The random sieving of $R(x)$ is performed via hashing, and we say that $y$ passes the sieve if $h(y)=0^m$, where $h$ is the hashing function used for sieving. We use several sieve densities (e.g., powers of $1/2$). When the density is too small (i.e., smaller than $0.1/|R(x)|$), no element of $R(x)$ passes (whp); but when the density is large enough (i.e., bigger than $10/|R(x)|$), some elements of $R(x)$ pass.
The question whether some element of $R(x)$ passes the sieve defined by a hashing function $h$ (i.e., $w$ passes iff $h(w)=0^i$) is an NP-type of question (captured by the set $S_{R,H}$ that contains $(x,i,h)$ if and only if there exists $y$ s.t. $(x,y)\in R$ and $h(y)=0^i$).
This gives an approximation of $\log_2|R(x)$ to within $\pm4$ (i.e., our estimate of $|R(x|$ is in the interval $[|R(x)|/16, 16|R(x)|]$). To get a better approximation ratio, we amplify via Cartesian product (i.e., $R^t(x)=\{(y_1,...,y_t):(\forall i) y_i\in R(x)\}$).
The notion of (promise) problems with unique solutions: The search and decision versions are obtained by restricting the original problems to instances that have at most one solution.
THM 6.29 states that for every $R\in\PC$, the search problem of $R$ is reducible to a unique search problem in PC. Furthermore, if $R$ is PC-complete via a parsimonious reduction, then the search problem of $R$ is reducible to the unique search problem of $R$.
The proof uses the same idea of random sieves, but here, on input $x$ we use a sieve density that is around $1/10|R(x)|$ such that, with constant probability, a single solution $w\in R(x)$ passes the sieve.

READING ASSIGNMENT

Appendix D.2 on Hashing Functions.
- When reading Apdx D.2.2, you may either take the claims on "faith" or read Sec 8.5.1.1 now.
- When reading Apdx D.2.3, you may skip Thm D.5 and its two proofs, but take a look (again) at the discussion in Apdx D.1.2.4
Sec 6.2.2.2: Approximating #P via a PPT reduction to NP.
Sec 6.2.3: Unique solution problems.

ADD'L READING (NOT MANDATORY):

Thm D.5 (aka "Leftover Hashing Lemma") is often used in other places, so it is a good opportunity to learn it now, especially since it follows quite easily from Lem D.4 (see the main/first proof). The alternative/second proof is also instructive.

TAKE-HOME MESSAGES FOR THE CURRENT MEETING:

The Hashing Lemma (Lem D.4): For a family of pairwise independent hashing functions mapping $n$-bit strings to $m$-bit strings, almost all hashing functions map a sufficiently large subset almost uniformly on their range, where sufficiently large means somewhat larger than $2^m$.
Relative approximation of any search problem in PC is PPT-reducible to NP (Thm 6.27).
Solving any search problem in PC is PPT-reducible to finding unique solutions for some PC-problem (Thm 6.29).

NOTES FOR 14TH MEETING

Topic: Counting classes - Part 3 (uniform generation).
Estimated difficulty: 3 in the scale of 5. You may want to ask me to attend the meeting.

The issues that I'd like to highlight this time include

The notion of uniform generation (of solutions wrt $R\in\PC$). Actually, there are two definitions and/or notions:
1. Exact uniform generation: This stronger notion requires that, on input $x$, the generation process outputs a solution w.p. at least 1/2, and conditioned on that event (i.e., of generating an output), all solutions should be output with equal probability.
2. Approximate uniform generation: This weaker notion requires that, for a proximity parameter $\eps$, the generated output distribution should be $\eps$-close to the uniform distribution over all valid solutions.
Note that (1) implies (2) with an exponentially vanishing $\eps$.
Approximate counting is computationally equivalent to approximate uniform generation with $\eps=1/\poly$ for a sufficiently large polynomial. The equivalence holds for each $R\in\PC$ provided that $R$ is $\PC$-complete under a "strongly parsimonious" reduction, where "strongly" means that there is an efficiently computable 1-1 mapping of solutions to the target instance to solutions to the original instance. What is actually shown (in Thm 6.31) is:
1. For every $\R\in\PC$, uniform generation for $R$ with exponentially vanishing deviation is PPT reducible to approximately (up to some specific eps=1/poly) counting the number of solutions wrt a ("related") $R'\in\PC$. That is, using a sufficient good approximate counter for $\#R'$ allows to (approximately) uniformly generate solutions for $R$.
2. For every $\R\in\PC$, approximate counting for $R$ (up to any eps=1/poly) is PPT reducible to uniform generation (with error eps'=eps/poly) for a ("related") $R'\in\PC$. That is, using a sufficient good approx-uniform generator for $R'$ allows to (approximately) count solutions for $R$.
In both directions, on input $x$, we consider the tree of prefices of all possible solutions for $x$ wrt $R$.
- In direction (1), we generate the desired solution bit-by-bit, by extending the current prefix one bit at a time. The approximate counter is used for determining the probability with which to extend the current prefix by each of the two possible bits, where a "twist" is made to get exponentially vanishing deviation. We stress that the deviation of the generator is only due to the error probability of the approximate counter, whereas the deviation of the approximate counter affects only the running time of the uniform generation.
- In direction (2), we approximate the total number of solutions by repeatedly extending the ``heaviest'' prefix of a solution, and approximating the fraction of solutions that agree with the current prefix and with the current prefix extended by one bit. Uniform generation is used to approximate the fraction of the number of solutions that extend the current prefix with each of the two possible bits, and it is important to continue with the more frequent option (since this guarantees that a good additive approximation for this frequency is also a good relative approximation).
A direct uniform generation (in the stronger (exact) sense) for any $R\in\PC$ by a PPT machine with oracle to some NP set. On input $x\in S_R$, we estimate $m_x=\log_2|R(x)|$, and set $m=\max(0,m_x-\log 40\ell)$, where $R(x)\subseteq\{0,1\}^\ell$. We then proceed as follows:
1. Select an $\ell$-wise independent hashing function $h\in H_\ell^m$, which (with very high probability) partitions $R(x)$ into $2^m$ cells of size approximately $40\ell$, where for $\alpha\in\{0,1\}^m$ the $\alpha$-cell is the set of all $y\in R(x)$ s.t. $h(y)=\alpha$. If there exists $\alpha$ s.t the $\alpha$-cell is bigger than $200\ell$, then halt with no output. Note that this is an NP-query!
2. Select a cell $\alpha$ uniformly at random. Using the NP-oracle, find all its elements, and output each of them w.p. $1/200\ell$. (This mean that no output is produced with probability $1-(s/200\ell)$ where $s$ is the number of elements in the $\alpha$-cell.)
Indeed, output is generated only with constant probability, which can be decreased by repetitions. The latter idea is called rejection sampling (see take-home messages).

READING ASSIGNMENT: Section 6.2.4 (uniform generation). Note that you'll need Lemma D.6 (at the end of Apdx D.2).

HOMEWORK ASSIGNMENT:

Exercise 6.39. Don't provide all details for part (1), just give an outline.
Either Exercise 6.38 or Exercise 6.41 at your choice.

Try to do all without looking at the guidelines.

TAKE-HOME MESSAGES FOR SEC 6.2.4:

Morally speaking, uniform generation of solutions and approximate counting of solutions are closely related; that is, each of these tasks is PPT-reducible to the other (Thm 6.31). Actually, this holds only if the residual problems that arise from the proofs are adequately reducible to the original ones.
The proofs of the two directions of Thm 6.31 use two different ways of extending a (partial) rooted binary tree. In the first way, a child is chosen with probability that is proportional to the number of leaves in the sub-tree rooted at it. In the second way, we chose the child that has more leaves in the sub-tree rooted at it.
The rejection sampling paradigm: Suppose that we have an algorithm that fails with probability at most $p$ and otherwise generates output that is distributed according to $D$. Then, by repeating this algorithm for at most $t$ times, we fail with probability at most $p^t$ and otherwise generate output that is distributed according to $D$.

NOTES FOR the 15TH MEETING

Topic: Hardness amplification - part 1 (One-Way Function)
Estimated difficulty: 1 in the scale of 5.

The material we'll cover in the rest of this course has a significant overlap with the material taught in a course on the "Foundations of Cryptography" (especially, when using my own book). Specifically, one-way functions (see current Section 7.1) general-purpose pseudorandom generators (Sec 8.2), and zero-knowledge interactive proof systems (Sec 9.1-9.2) are covered in both courses; however, the perspective is different.

We start with Chapter 7, titled "the bright side of hardness". Although this hints to the applications of computational hardness, the actual focus would be on "hardness amplification"; that is, turning moderate levels of hardness into stronger levels, which, in turn, are the ones useful (e.g., for pseudorandomness and Cryptography).

In the current reading assignment (sec 7.1.0-7.1.2), three issues are discussed.

Moving from P-vs-NP and NP-vs-BPP, which are worst-case assumptions, to "hard instances that we can put our hands on" (i.e., average-case), and to "generating hard instances with solutions" (which "we can use"), and how the latter lead to the definition of OWF (= One-Way Function).
The actual definition of OWF. Issues include probability of inverting, the need to provide the preimage-length in unary, etc.
Hardness amplification from "weak OWF" to (strong/standard) OWF. We cannot assume a "direct product" attacker/inverter, and so the proof is more complex than the Information Theoretic analogue. We use a "reducibility argument" (i.e., a reduction coupled with an average-case analysis), and (unlike the information theoretic analogue) it has a cost (see the running-time analysis)!
Specifically, the reducibility argument show that inverting the given weak OWF with probability that exceeds the corresponding bound is reducible to inverting the purported strong OWF with non-negligible probability (see proof of Thm 7.5). Hence, the (contradiction) hypothesis that the resulting function is not a strong OWF implies that the original function is not a weak OWF (as guaranteed).

READING ASSIGNMENT: Section 7.1.0--7.1.2. I am/was tempted to suggest all of Sec 7.1 for this reading, which is not too much in terms of pages (i.e., about a dozen pages). But, in 2nd thought, I feel that Sec 7.1.2 and 7.1.3 may contain too much of a "new mindset" that is good to digest slowly and discuss. Anyhow, those who are bored can try reading Sec 7.1.3 too.

SUGGESTED EXERCISES (not to be submitted):

Exer 7.1 and 7.3 should be very easy.
Exer 7.4 may sound surprising, but is actually not hard.

These exercises are good both for getting into the new mindset and for the facts that they communicated. TAKE-HOME MESSAGES FOR SEC 2.3:

The definition of a one-way function (OWF): The strong/standard version (Def 7.1) versus the weak one (Def 7.4).
Weak OWFs can be easily transformed into strong ones (Thm 7.5).
Hardness amplification is more complex than the information theoretic analogue. Typically, a reducibility argument is used to prove that if the suggested construction fails (e.g., is not a strong OWF), then the original hypothesis is false (e.g., violation of weak OWF). In other words, violating the original hypothesis (e.g., a weak OWF) is reduced to violating the claim about the presented construction (e.g., that it is a strong OWF).

NOTES FOR the 16TH MEETING

Topic: Hardness amplification - part 2 (Hard-core predicates)
Estimated difficulty: 3 in the scale of 5. You may want to ask me to attend the meeting.

The current reading assignment consists of two parts:

Hard-core predicates (Sec 7.1.3),
Starting to discuss hardness in E (Sec 7.2.0-7.2.1.1).

[NOTE: It may be better to split this material and do each part in a separate meeting. I suggest to decide which way to go after trying to read all, possibly stopping at a point in which it feels too much.]

Regarding the 1st part (hard-core predicates), following is an overview of the ideas underlying the proof of Thm 7.7, which asserts that (essentially) any one-way function has a hard-core predicate. The proof uses a reducibility argument; that is, we show that if the constructed predicate is not a hardcore of the function, then the function is not a OWF. Indeed, we reduce inverting the function to guessing the value of the predicate with success probability that is significantly higher than 1/2.

We start with the case that the predicate can be approximated with success probability 0.76 (i.e., non-negligibly above 3/4), distills the problem of "error doubling", presents a "dream" in which this problem can be solved, and "materialize the dream". Following is additional clues about the dream and materializing it.

We first observe that inverting the function $f$ on input $f(x)$
reduces to computing $b_x:\{0,1\}^n\to\{0,1\}$
such that $b_x(r)=b(x,r)=\sum_ix_ir_i$, 
when given oracle access to $B_x$ that agrees with $b_x$ 
on a $0.5+\epsilon$ fraction of the possible instances. 
(The simple case is that $B_x$ agrees with $b_x$ on a 0.76 fraction
of the instances, and in this case it is easy to compute $b_x$
by ``self-correction'' (using $b_x(z) = b_x(z+r) \xor b_x(r)$ for any $r$).) 
One may present the dream as a model in which we obtain a random sample 
of labelled examples; that is, a model in which we are given a sample $R$ 
of random points $r$'s along with the correct values of the $b_x(r)$'s. 
Furthermore, when solving the problem in this dream-model, 
we note that our solution holds also when the sample $R$ consists of 
pairwise independent points that are uniformly distributed in $\{0,1\}^n$. 
This observation will allow to implement the dream-model in the standard model.
(On the other hand, a dream that asserts totally independent $r$'s
is too good to be implementable, because it allows to find $x$
without querying $B_x$ at all (since a set of $O(n)$ totally independent 
random $n$-bit strings is likely to span all $n$-bit strings).)

(This concludes the discussion of the 1st part.)

Regarding the 2nd part (hardness amplification in E), I wish to highlight the amazing result that asserts that worst-case hardness implies mild average-case hardness. The key steps ain its proof are (i) considering a low-degree extension $f'$ of the original "worst-case hard" function $f\in E$, and (ii) using "self correction" (a randomized reduction of $f'$ to itself) to show that if $f'$ is easy to compute on "much" of its domain then it is easy to compute on each point in its domain.

QnA (by Ori Sberlo): Can one not establish average-case hardness unconditionally, by diagonalization? In fact, one can (see Goldmann, Grape, and Hastad, IPL 1994), but this will hold only with respect to uniform classes, whereas here we seek hardness wrt non-uniform circuits.

DIGEST (best read after reading the material itself): Both the aforementioned proofs relied on the self-correction paradigm, which appears in a more clear manner in the 2nd proof/part. Specifically, this paradigm refers to specific functions (e.g., low degree polynomials) in which the value of the function at any point can be easily deduced from its value at few different points. Typically, each of these different points may be uniformly distributed in the function's domain, but they points are related in a way that depends on the point that we wish to compute. For example if $F:D\to R$ is a linear function, then we have $F(z) = F(z+r)-F(r)$ for any $z,r\in D$, which means that for every point $z$ the value $F(z)$ can be deduced from the value of $F$ at $z+r$ and $r$. Hence, if we can compute $F$ correctly on 76% of its domain, then we can compute $F$ correctly at any point $z$ by selecting $r$ at random and computing the value of $F$ at $z+r$ and $r$.

READING ASSIGNMENT: Read the following, including the specified exercises.

Hard-core predicates (Sec 7.1.3). See Exer 7.5 regarding the underlying pairwise independent construction.
Preliminaries regarding hard functions in E (Sec 7.2.0).
Hardness Amplification from Worst-case to mild average-case (Sec 7.2.1.1). See Exer 7.11 ("self correction" of low degree polynomials) and Exer 7.12 ("low degree extension"). These are basic tools, often used in very different contexts of theory of computation, so I strongly recommend getting familiar with them.

As said upfront, if you feel that this material is too much for one meeting, you may split it to two meeting, devoting a full meeting to each of the two parts.

SUGGESTED EXERCISES (not to be submitted):

First priority: The three exercises mentioned above; that is, Exer 7.5 (a specific construction of pairwise independent points), Exer 7.11 (self correction of low degree polynomials) and Exer 7.12 (low degree extension).
Simple variants on Thm 7.7: Exer 7.6 and 7.7.
Exer 7.9 and 7.13 should also be easy.

TAKE-HOME MESSAGES FOR THE CURRENT MEETING

The self-correction paradigm (see Digest above).
If it is hard to recover $x$ from some information $I(x)$, then it is hard to approximate the inner-product mod 2 of $x$ with a random $r$ when given $I(x)$ and $r$ (Thm 7.8).
The counter-positive is a list decoding algorithm for the Hadamard code (i.e., encoding linear functions from $GF(2)^n$ to $GF(2)$ by their truth-tables): Given oracle access to an $2^n$-bit long string $w$, we can reconstruct in $\poly(n/\eps)$-time concise descriptions of (i.e., the information encoded in) all codewords that agree with $w$ on at least $0.5+\eps$ fraction of the locations.
Representing Boolean functions by their low-degree extension preserves the size of the function up to a polynomial blow-up. Specifically, the function $f:\{0,1\}^n\to\{0,1\}$ is viewed as $f:[n]^{n/\log_2}\to\{0,1\}$, and we consider a low-degree extension of it (having individual degree $n-1$) over a field of size $\poly(n)$. [See Const 7.11 and Exer 7.12]
Polynomials of total degree $d$ have self-correction procedures that query the function at $d+1$ points such that each point is uniformly distributed in the domain. This implies that, for small $d$, if a polynomial of degree $d$ is easy to compute on all but at most a $1/3d$ fraction of its domain, then it is easy to compute on its entire domain. Equivalently, if such a polynomial is hard to evaluate on the worst-case, then it must be hard to evaluate on a noticeable fraction of its domain. [See Sec 7.2.1.1 and Exer 7.11]

NOTES FOR the 17TH MEETING

Topic: Hardness amplification - part 3 (the XOR lemma)
Estimated difficulty: 5 in the scale of 5. Highly recommended to ask me to attend the meeting.

[WARNING: This reading may be the most difficult in the course]

[Class of 2018/19 recommendation: Remind the students that the point in this chapter is starting from an assumption that is as weak as possible and deriving useful tools, which do seem stronger from the starting assumption (although their derivation proves that they are not). The usefulness of these tools will be demonstrated in Chapter 8, and is further demonstrated in Apdx C.]

The current reading assignment refers to hardness amplification (in E); specifically, the amplification is from mild average-case hardness to very strong level of average-case hardness. In addition, there is a homework assignment and optional reading.

The plan for the meeting includes:

Discussing the XOR Lemma and the Direct Product Lemma (DPL).
Showing that the DPL implies the XOR Lemma by
- Showing that the DPL implies a "Selective XOR Lemma" (very much as Thm 7.8).
- Showing that the Selecting XOR Lemma implies the (standard) XOR Lemma (see Exer 7.17).
(Note that proving the Selecting XOR Lemma would have been "good enough" for subsequent applications, and still the XOR lemma is a bit more appealing.)
Proving the DPL by proving a generalized version (with an arbitrary number of input blocks (i.e., inputs)) and proving the latter by induction on the number of inputs (see proof of Thm 7.14), while relying on a quantitative two-input version (i.e., Lem 7.15).
Warning: an "unquantitative induction" cannot work for a non-constant number of iterations. By a quantified statement we mean one that refer to specific circuit sizes rather than to unspecified polynomial-size.

Add'l ideas to highlight include the observation that any distribution (on strings of feasible length) can be effectively emulated by small (non-uniform) circuits.

ADDITIONAL COMMENTS [MAY 2022]

You have already seen that proving an intuitive complexity theoretic statement 
such as Thm 7.5 ("amplification of one-way function")
is significantly harder than proving the information theoretic analogue.
The source of difficulty is that the information theoretic analogue
reflects a "natural" algorithm that works on a "composed computational
problem" while respecting its structure, whereas a general algorithm
may not do so; for example, a natural algorithm computing $F(x) = f(g(x))$
first computes the map $x\mapsto g(x)$ and then computes $y\mapsto f(y)$,
whereas as general algorithm may use some "shortcuts"
(e.g., consider the case $f = g^{-1}$....).
Likewise, a natural algorithm for computing $F(x,y) = (f(x),f(y))$
works separately on each block, whereas a general algorithm may not do so.

Hence, it is natural (or easy to establish for natural algorithms)
that if $f$ is hard to approximate with success above $\rho$
(which corresponds to correlation above $2\rho-1$)
then $F(x_1,....,x_t) = \xor_{i\in[t]} f(x_i)$
is hard to approximate with success above $0.5+0.5\cdot(2\rho-1)^t$
(which corresponds to correlation above $(2\rho-1)^t$).
But this corresponds to the natural algorithm that tries
to approximate each of the $f(x_i)$'s and takes the XOR of its results.
Yao's XOR Lemma asserts that *essentially* a general algorithm
cannot do *much* better.... (Note: we don't know how to prove that 
a general algorithm cannot do *slightly* better...; it may even be wrong...)

I believe it is easier to think and prove a corresponding claim about
direct product. It asserts that $P(x_1,...,x_t) = (f(x_1),...,f(x_t))$
is hard to compute with success essentially above $\rho^t$.
This is analogous to Thm  7.5, but note that here we talk about
(the hardness of) computing $f$ and P$ rather than about inverting them....
We get from $P$ to $F$ by using theorem 7.8 and additional tricks
(which close the gap between  "selective" and "standard" XOR Lemma).

The direct product lemma, which refers to general $t$
(which is at most polynomial in $|x_1|= ... = |x_t|$)
is proved by "reduction" to the case of $t=2$.
But for this to work we need a quantified version of the special case;
that is, we cannot talk in terms of "polynomial-size hardness" of
one task implies "polynomial-size hardness" of the other task,
because the proof is by contrapositive showing that a polynomial
for the second task translates to polynomial for the first task.
But unless we quantify this statement better (which we'll do!),
there is no guarantee above what polynomial we get from
the translation, which is not a problem if we apply
this reasoning a constant number of times but is
a problem if we apply it a non-constant number of times.

Suppose the statement is that polynomial $p(n)$ for the second task
yields a polynomial $2p(n)$ for the first task
(which means that if the first task is hard for size $2p(n)$
then the second is hard for size $p(n)$).
If we start with hardness for (polynomial) size $p(n)$,
then after $t$ steps we get hardness for size $p(n)/2^t$,
which is meaningless if $t$ is super-logarithmic.
Things are even worse if the translation maps
a polynomial $p(n)$ to the polynomial $n\cdot p(n)$ or $p(n)^2$;
in such a case, we can apply this only a constant number of times.

The quantified version of direct product for two arguments says that
if predicting $f_i$ with success probability $\rho_i$ requires circuits
of size greater than $s_i$, then predicting $P'(y,z)=(f_1(y),f_2(z))$
with success probability (almost) $\rho_1 \cdot \pho_2$
requires circuits of size at least $\min(s_1, s_2/\poly(n))$,
where $\poly$ is a fixed polynomial I'm too lazy to compute
(let alone that it also depends on the "almost" measure above).
Note the asymmetric form of the expression in the result:
There is no loss in one parameter,
but there is a significant/natural/expected loss in the other.
So using this result cleverly,
we can establish the general case (of general $t$);
hint, define $P^{(i)}(x_1,...,x_i) = (P^{(i-1)}(x_1,...,x_{i-1}),f(x_i))$,
apply the two-argument lemma to $P^{(i)}$ after establishing
the claim for $P^{(i-1)}$, and end with $P = P^{(t)}$.

READING ASSIGNMENT: Read Section 7.2.1.2 ("Yao's XOR Lemma").

SUGGESTED ADDITIONAL READING (not mandatory):

Hardness amplification via "list decoding" (Sec 7.2.1.3); this goes directly from worst-case to strong average-case (rather than going via the mild level as we've done).
Reading priorities: The 1st priority is the more concrete treatment on pages 266-7 (the abstraction on pages 268-9 is less important and hard to follow).
Tighter amplification (wrt exponential size circuits): Sec 7.2.2. The amplification in prior sections moved from input-length $n$ to input-length $m=\poly(n)$, while the circuit sizes only decreased (and so hardness for circuit size $s(n)$ yields hardness for circuit size $s'(m)=s(m^{1/O(1)})$.
Reading priorities: The 1st priority is the notion of "hard region" (Sec 7.2.2.1). The 2nd priority is its use for amplification (Sec 7.2.2.2). Note that Sec 7.2.2.2 is intentionally sketchy (especially Lem 7.23).

I think it is good to have a vague idea of what is going on here (i.e., in these additional readings), even if one does not follow all details (which would indeed require reading beyond this text).

HOMEWORK ASSIGNMENT:

Exercises 7.17 and 7.18: The first exercise shows that the Selecting XOR Lemma implies the (standard) XOR Lemma, whereas the second exercise shows the converse.

SUGGESTED EXERCISES (not to be submitted):

Exer 7.14 (variant on Thm 7.7: "hardcore predicate")
Exer 7.15 (reversing Exer 7.14).
Exer 7.16 (very simple -- averaging argument applied to circuits).

TAKE-HOME MESSAGES FOR THE CURRENT MEETING:

The XOR Lemma a.k.a Yao's XOR Lemma (Thm 7.13).
The Direct Product Lemma (Thm 7.14), which is actually more intuitive than the XOR Lemma.
A non-constant number of induction steps cannot be employed when the statements are not fully quantified. Specifically, while Thm 7.14 refers to circuits of arbitrary polynomial-size, its proof relies on a generalization that refers to circuits of specific sizes, and proceeds by induction that keeps track of these sizes (via using a lemma that refers to circuits of specific sizes).
In contrast, note that for a sequence of statements $S_i(n)$ that is each of the form ``poly(n)-size circuits cannot do $X_i$'' and a generic proof that asserts that $S_i(n)$ implies $S_{i+1}(n)$, we cannot use a non-constant number $m(n)$ of steps to prove that $S_1(n)$ implies $S_{m(n)}(n)$. This is because $S_{i+1}$ may be proved by a reducibility argument that takes a polynomial-size circuit that violates $S_{i+1}$ and construct a (larger) polynomial-size circuit that violates $S_i$ (e.g., the size of the latter circuit may be twice the size of the former or even a polynomial in the size of the former). (Note that when proving the generalization of Thm 7.14, the size of the circuits remain unchanged along the steps.)

NOTES FOR the 18TH MEETING

Topic: Pseudorandom generator - part 1 (the general paradigm and more)
Estimated difficulty: 1 in the scale of 5.

Your last reading assignment was, by far, the most difficulty and technical one for this course. So no need to fear anything that may be coming in the next months.

The current reading assignment starts a series of meetings on pseudorandom generators. I suggest to view "pseudorandom generators" (PRGs) as a general paradigm with many incarnations (or as a family name with many members).

I am a bit unsure about the organization and partition of the material for the next two meeting. So I'll provide what is planned for both meetings as well as my recommendation and hesitations (see discussion following the reading assignment).

In the (current/1st) meeting, I suggest to focus on three issues.

The general paradigm of PRGs, which refers to three aspects. Most importantly, it postulates (1) a deterministic stretch and (2) a notion of computational (i.e., restricted) indistinguishability, which is weaker than statistic indistinguishability. On top of it, we consider (3) the complexity of generation. Without aspect (3), we can easily get strong PRGs (see Exer 8.1), but these are not really useful.
Indeed, PRGs stretch/extend randomness, they do not "generate" randomness from scratch. The randomness that they stretch is provided explicitly in their seed.
The notion of general purpose PRGs. In this incarnation, aspect (2) refers to any PPT, whereas aspect (3) refers to a fixed poly-time computation (and thus stretch is restricted to that level). The point is that a single PRGs is good wrt all efficient applications, even those that require more resources that the generator's evaluation.
The notion of Computational Indistinguishability (CI), while highlighting the following:
- CI is a strict relaxation of statistical indistinguishability.
- Showing that the demonstration of the prior phenomenon (i.e., CI is a strict relaxation of statistical indistinguishability) by PPT-computable distribution ensembles requires PRGs (see Exer 8.9).
- Computational indistinguishability wrt multiple samples (Def 8.5) and when is it implied by CI wrt a single sample (Prop 8.6 (see also Exer 8.10)).
- The hybrid method (first used in the proof of Prop 8.6 (see also a digest that follows)).

AFTERTHOUGHT (May 2023): THE LOSS IN THE HYBRID ARGUMENT

Using a hybrid argument that uses $t+1$ hybrids incurs a loss (in the distinguishability gap) of a factor of $t$. This seems essential, as illustrated in the following example. Consider a distribution $Z$, on $t$-bit long strings, defined by the following process, where $\ell=\log_2 t$ (assume $t$ is a power of 2 for simplicity).

Select $z$ uniformly at random among the $t$-bit strings.
Select $i\in[t-\ell]$ uniformly at random.
Reset the bits in position $i,i+1,...,i+\ell$ to zero.

Now, $Z$ always has a $\ell+1$ consecutive positions that are zero, whereas the probability that this event occurs in a random string is smaller than $t \cdot 2^{-(\ell+1)} \leq 1/2$. On the other hand, for any $j\in[t-1]$, we can upper-bound the probability of correctly predicting the $j+1$th bit based on the previous $j$ bits; specifically, the probability is maximized by always predicting zero, and it then equals $0.5+(\ell/(t-1))$, where the advantage corresponds to the case that $j\in[i,...,i+\ell-1]$.

READING ASSIGNMENT (for the next two meeting):
[the default suggestion is to read the following five items for the 1st/current meeting]

Sec 8.0 -- a general introduction (a conceptual discussion)
Sec 8.1 -- the general paradigm (a conceptual discussion)
Sec 8.2.1 -- General-Purpose PRGs, their definition.
Sec 8.2.2 -- The archetypical application of General-Purpose PRGs.
Sec 8.2.3 -- Computational Indistinguishability (CI), in general.

[and read the rest for the second/next meeting]

Sec 8.2.4 -- Stretch amplification (for General-Purpose PRGs)
Sec 8.2.5 -- Construction of General-Purpose PRGs.
Sec 8.2.6 -- A non-uniform version of General-Purpose PRGs.
Sec 8.2.7.2 -- reflections (a conceptual discussion)
Sec 8.3.1 -- Derandomization via canonical derandomizers

[NOTE: Indeed, my suggestion is to read Sec 8.0-8.2.3 for the 1st meeting, and the rest for the next (i.e., 2nd) meeting; that is, proceed by the actual order of the sections, and break after Sec 8.2.3. Yet, a good alternative is to take Secs 8.2.2 and 8.2.6 out of the order and use the following order instead: (i) Read Sec 8.0-8.2.1 and 8.2.3 first, (ii) Read Sec 8.2.4, 8.2.5, and 8.2.7.2 next, and only then (iii) read Sec 8.2.6, Sec 8.2.2, and 8.3.1. The justification for this alternative is that Sec 8.2.6 demonstrates the point of Sec 8.2.2 in a non-uniform context, while using a much simpler proof.]

In any case, pay close attention to Sec 8.2.3.3 (see the "hybrid method").

SUGGESTED EXERCISES (not to be submitted):

Exer 8.1 (existence of inefficient PRGs)
Exer 8.7 (PRGs imply non-trivial cases of comput' indist.)
Exer 8.9 (on the necessity of assumptions for Exer 8.7).
Exer 8.10 (when single sample CI does not imply multiple sample CI)
Exer 8.12 (unpredictability versus indistinguishability), but this may fit better to next meeting.

TAKE-HOME MESSAGES FOR THE CURRENT MEETING:

Pseudorandomness is a general paradigm with many incarnations. So when you hear this term being used, you should always ask in what sense it is meant.
In some cases (e.g., general-purpose PRGs (Sec 8.2)) the potential distinguishers are stronger than the PRG (and so their output may not prepend the seed that they use). In other cases (e.g., the canonical derandomizers (Sec 8.3)) the potential distinguishers are weaker than the PRG (and so their output may be prepended by the seed that they use).
In some cases computational indistinguishability wrt multiple samples is implied by computational indistinguishability wrt a single sample (e.g., Prop 8.6), but in other cases this does not hold (see Exer 8.10).
The hybrid method is extremely useful. (So please spend time reflecting on the proof of Prop 8.6 and the digest that follows it.)

NOTES FOR the 19TH MEETING

Topic: Pseudorandom generator - part 2 (general-purpose PRGs)
Estimated difficulty: 1 in the scale of 5.

The current reading assignment is the 2nd part of the double assignment that I gave last time -- see below. We shall finish dealing with general purpose PRGs and turn to "canonical derandomizers".

In the meeting, I suggest to focus on four issues.

Amplifying the stretch: This relies on the distinguishers being more powerful than the PRG (and in particular being able to invoke the PRG (on relatively long seeds)). There are two alternative constructions (see Const 8.7 and Exer 8.11, respectively), which are easier to explain with pictures. Both are analyzed via a hybrid argument (and the corresponding hybrids are easy to see in pictures (see, e.g., Fig 8.2)).
Constructing a PRG based on any one-way *permutation* (Prop 8.9) and stating the equivalence "PRG iff OWF" (Thm 8.11).
Next bit unpredictability as equivalent to pseudorandomness (see proof of Prop 8.9 as well as Sec 8.2.5.2 and Exer 8.12). For "Next bit unpredictability implying pseudorandomness" we use a hybrid argument (and also analyze the assertion when applied to a PRG that stretches its seed by a single bit).
The derandomizations performed using general purpose PRGs reveal an "overkill": We don't need the PRG to work in poly-time. It suffices that it works in exp-time. Also, it is OK if the PRG works more time than the distinguisher. Hence, canonical derandomizers (as defined in Sec 8.3.1) suffice.

READING ASSIGNMENT
The second half of whatever you did not read so far in the following. [The following repeats the text of last time; recall that the default suggestion was to read the following five items for the 1st/previous meeting]

Sec 8.0 -- a general introduction (a conceptual discussion)
Sec 8.1 -- the general paradigm (a conceptual discussion)
Sec 8.2.1 -- General-Purpose PRGs, their definition.
Sec 8.2.2 -- The archetypical application of General-Purpose PRGs.
Sec 8.2.3 -- Computational Indistinguishability (CI), in general.

[and the rest for 2nd/current meeting]

Sec 8.2.4 -- Stretch amplification (for General-Purpose PRGs)
Sec 8.2.5 -- Construction of General-Purpose PRGs.
Sec 8.2.6 -- A non-uniform version of General-Purpose PRGs.
Sec 8.2.7.2 -- reflections (a conceptual discussion)
Sec 8.3.1 -- Derandomization via canonical derandomizers

[NOTE: The justification for the suggested order and the alternative were presented in the note for the previous meeting.]

Recall: In any case, pay close attention to Sec 8.2.3.3 (see the "hybrid method") and to Sec 8.2.5.2 (see "the next bit test" -- Exer 8.12).

SUGGESTED EXERCISES (repeated from previous meeting, not to be submitted):

Exer 8.1 (existence of inefficient PRGs)
Exer 8.7 (PRGs imply non-trivial cases of comput' indist.)
Exer 8.9 (on the necessity of assumptions for Exer 8.7).
Exer 8.10 (when single sample CI does not imply multiple sample CI)
Exer 8.12 (unpredictability versus indistinguishability)

TAKE-HOME MESSAGES FOR THE CURRENT MEETING:

General-purpose PRGs exists if and only if one-way functions exist (Thm 8.11).
The stretch of general-purpose PRGs can be amplified to any polynomial (sec 8.2.4).
Next bit unpredictability is equivalent to pseudorandomness (see Sec 8.2.5.2 and Exer 8.12).
The hybrid method is extremely useful.
The notion of canonical derandomizers (Def 8.14), which allows the PRG to work in exponential time and requires its output to be indistinguishable only from quadratic size circuits.
Note that the generated sequence may be of exponential length, and still the square of this length may be significantly smaller than the time used by the PRG.

WELL-KNOWN OPEN PROBLEMS:

Provide a simpler construction of a general-purpose PRGs based on any one-way function.
Provide a construction of a general-purpose PRGs based on any one-way function in which the seed length is linear (rather than polynomial) in the length of the arguments to which the one-way function is applied.

See Vadhan's survey (published here) for a survey of the state of the art in this direction.

NOTES FOR the 20TH MEETING

Topic: Pseudorandom generator - part 3 (canonical derandomizers)
Estimated difficulty: 3.7 in the scale of 5. It is recommended to ask me to attend the meeting.

The current reading assignment has two parts, the main one being the construction of canonical derandomizers, and the other is background material for the next meeting (i.e., Meeting 21).

Focusing on the first part (Sec 8.3.2), we motivate the construction (next), whereas the actual reading assignment provides a full description (see Constr 8.17) and analysis (see Thm 8.18). Interestingly, the underlying ideas can be applied also in other settings (see Sec 8.3.3.1).

The construction, known as the Nisan-Wigderson construction, consists of applying a hard-to-approximate function, which is computable in E but hard to approximate by smaller-than-exponential circuits, to many different projections of the seed. We cannot afford pairwise disjoint projections (since this violates the stretching requirement), whereas identical projections are obviously bad. So we need a family of projections (or sets) with small pairwise intersections, and we need to construct this family in time that is exponential in the seed. The analysis shows that this yields a PRG, provided that the pairwise intersections are smaller than the logarithm of the size-bound that we allow in the inapproximability hypothesis.

AFTERTHOUGHT (May 2023): THE CIRCUIT ESTIMATION PROBLEM

The canonic derandomizer may be viewed as a way to derandomize the straightforward randomized algorithm for the following generic/universal problem, hereafter called the circuit estimation problem. Actually, the problem is defined with respect a parameter, denoted $\eps$, which (for starters) we fix to 1/7. The problem is a promise problem in which, given a circuit C, you is asked to distinguish between the following two cases

the case $\prob[C(r)=1] \geq 0.5+\eps$, where $r$ is uniformly distributed $n$-bit strong.
the case that $\prob[C(r)=1] \leq 0.5-\eps$, where $r$ is uniformly distributed $n$-bit strong.

Focusing on $\eps=1/7$ losses no generality, because we can reduce the case of $\eps=1/\poly(n)$ to the (seemingly easier) case of $\eps = 0.5 - \exp(-n^{\Omega(1)})$ (one can even use $\eps = 0.5 - \exp(-\Omega(n))$, but that's more complicated - it requires ``efficient error reduction''). The idea is to define, say for $t=k/\eps^2$, a circuit $C':\bitset^{tn}\to\bitset$ such that $C'(x_1,....,x_t) = \MAJ_{j\in[t]} C(x_i)$. The analysis reduces to defining random variables $Z_i(x_i)=C(x_i)$, where $x_i$ is uniformly distributed in $\bitset^n$, and observing that if $\prob[Z_i=1] \geq 0.5+\eps$, then $\prob[sum_{i\in[t]}Z_i > 0.5t] = 1-\exp(-\Omega(\eps^2 t))$. (Analogous for $\prob[Z_i=0] \geq 0.5+\eps$.)

READING ASSIGNMENT -- these are three unrelated items:

The construction of canonical derandomizers: Sec 8.3.2. This is the main part of the current reading assignment.
[The next two items are technical background that you'll need for the next reading assignment (i.e., Meeting 21).]
Take another look at Apdx D.2.3, and focus on the generalization called mixing (see Item 1 in the homework assignment).
Introduction to Randomness Extractors: Apdx D.4.0-D.4.1.1.

[We shall use (2) and (3) in our next reading assignment (in Meeting 21, devoted to space bounded distinguishers), but it is good to read this material now because the next reading assignment will be more "dense"/"heavy".]

HOMEWORK ASSIGNMENT (to be submitted):

Prove the Mixing Lemma stated in the middle of page 530 (i.e., the paragraph "A generalization (called mixing)" which follows the proof of Lemma D.4).
Exer 8.24 (canonical derandomizers imply hardness). Try to do it without looking at the guidelines.

SUGGESTED ADDITIONAL READING (not mandatory):
[The following four items will not be used in this course, but they are very useful to know! The first two items are an offspring of the NW-construction, whereas the other two items are unrelated to it.]

Sec 8.3.3.1 (the NW-construction as a general paradigm).
This will lead into the following item (2)
Using the NW-construction paradigm for randomness extractors: Apdx D.4.2.
Having been introduced to Randomness Extractors (in Apdx D.4.0-D.4.1.1) and having seen a way of constructing them (in Apdx D.4.2), you may as well see two of their (non-direct) applications: I refer to their applications to to randomness-efficient error-reduction and to sampling.
Randomness extraction and randomness-efficient error-reduction: Apdx D.4.1.3
Randomness extraction and Sampling: Start with Apdx D.3 and then read D.4.1.2

SUGGESTED EXERCISES (not to be submitted):
[The first two items are most useful.]

Exer 8.12 and 8.20 (unpredictability vs indistinguishability).
Exer 8.19 (constructing a set system as required by Thm 8.18).
Exer 8.22 and 8.23 (generalizing Thm 8.18 and its application): Be warned that this is quite tedious, but can be useful for those who believe that they learn better "by doing".

TAKE-HOME MESSAGES FOR THE CURRENT MEETING:

Strong lower bounds on functions in E imply highly non-trivial derandominzation of BPP, where the strength of the lower bound determines the efficiency of the resulting deterministic algorithm:
1. An exponential lower bound implies that BPP is in P (Thm 8.19 Part 1).
2. Any super-polynomial lower bound implies that BPP is contained in DTIME(T) where $T(n)=\exp(n^{o(1)})$; actually, a lower bound that holds for a polynomial of degree d implies a result with $T(n)=\exp(n^{O(1/d)})$. (See Exer 8.23 that refers to Part 2 of Thm 8.19).
(Indeed, this follows the Hardness-vs-Randomness paradigm, which underlies Sec 8.2: See Section 8.2.7.2.)
The NW-construction as a general paradigm (Sec 8.3.3.1): Applying a "hard" function to "almost pairwise disjoint" projections of a random string yields a "pseudorandom" sequence, where by almost disjoint we mean having "small" intersection.

NOTES FOR the 21ST MEETING

Topic: Pseudorandom generator - part 4 (the space-bounded case)
Estimated difficulty: 4 in the scale of 5. It is recommended to ask me to attend the meeting.

The current reading assignment is devoted to the study of PRGs that fool space-bounded distinguishers. The topics include:

Definitional issues, which are inherited from the model of (randomized) space complexity (see Section 6.1.5)).
Two different (unconditional) constructions of such PRGs, which are geared to different settings.
1. A construction geared towards small space (i.e., the space complexity of the potential distinguishers is logarithmic in their time complexity); this construction uses a seed that is quadratic in the space complexity of the potential distinguisher.
2. A construction geared towards larger space (i.e., the time complexity of the potential distinguishers is polynomial in their space complexity); this construction uses a seed that is linear in the space complexity of the potential distinguisher.
A derandomization of BPL (the log-space version of BPP) that uses polylogarithmic space and polynomial time. This result uses the 1st PRG construction (i.e., the one for small space).

READING ASSIGNMENT -- basically, the entire Section 8.4.

Sec 8.4.1 deals with the definitional issues.
Sec 8.4.2.0 states the two results regarding PRGs.
Sec 8.4.2.1 sketches both constructions and their analysis.
Sec 8.4.2.2 explains how the 1st construction leads to a derandomization of BPL (in polylogarithmic space and polynomial time).

ADDITIONAL COMMENTS [MARCH 2024]

As hinted in my book, my own presentation of the paraameters is not the usual one. Usually, one starts with the space bound (which is viewed as logarithmic in the main input length), and states the time bound and seed length in terms of the sapce. In contrast, I insisted to start from the seed length, denoted $k$, and stated the space and time bounds in terms of $k$. Here is the traditional presentation followed by my own

complexity measure	Thm 8.21 (Nisan)	Thm 8.22 (NZ)
space	s = O(log\|x\|)	s = O(log\|x\|)
time	exp(Omega(s))	poly(s)
seed length	O(s^2)	O(s)

complexity measure	Thm 8.21 (Nisan)	Thm 8.22 (NZ)
seed length	k	k
space	Omega(k^{1/2})	Omega(k)
time	exp(Omega(k^{1/2}))	poly(k)

In the proof of both theorems (i.e., Thm 8.21 and 8.22), (on-line) randomized space is modeled by (non-uniform) devices, which are presented as layered graphs. The layers correspond to time (and bits in the tested input), and the vertices in each layer represent possible instantaneous configutations. Hence, the width of a layer is expentnential in the space complexity, denoted $s$. We consider blocks of consequetive layers with $n=Theta(s)$ layers per block, where $n \gg s$ (resp., $n \ll s$) in the proof of Thm 8.21 (resp., Thm 8.22). In both cases, the generator uses more space than the potential distinguisher.

The following outline of the proof of the first result (Thm 8.21) differs from the presentation in the book. Specifically, the construction is the same (and one may start by describing it using Fig 8.3), but the analysis is slightly different; specifically, rather than being analyzed by looking at contracted versions of the distinguisher (see page 321), we consider a sequence of distributions that this distinguisher may examine.

The first sequence we consider is the uniform sequence
$U_{\ell'n} \equiv U_n^{(1)}U_n^{(2)} \cdots U_n^{(\ell'-1)}U_n^{(\ell')}$
and the second sequence is 
$U_n^{(1)} h(U_n^{(1)}) \cdots U_n^{(\ell'/2)} h(U_n^{(\ell'/2)})$,
where $h$ is a random hash function. 
We show that these two sequences are indistinguishable
by a $s(k)$-space bounded automaton exactly as done on page 320.
The point is that the same argument applies to the sequences
$G_i(U_n^{(1)}) G_i(U_n^{(2)}) 
  \cdots G_i(U_n^{({\ell'/2^i}-1)}) G_i(U_n^{(\ell'/2^i)})$
and 
$G_i(U_n^{(1)}) G_i(h(U_n^{(1)})) 
  \cdots G_i(U_n^{(\ell'/2^{i+1})}) G_i(h(U_n^{(\ell'/2^{i+1})}))$,
where $G_i$ is an arbitrary mapping of $n$-bit strings 
to $2^{i}\cdot n$-bit strings. 
(Note that each of these sequences has length 
$(\ell'/2^i) \cdot 2^{i}\cdot n = \ell$.) 
To see that these sequences are indistinguishable by the said automaton, 
consider sets of the form $L_{u,v}^{(j)} \subseteq \{0,1\}^n$
such that $s \in L_{u,v}^{(j)}$
if the automaton moves from vertex $u$ in layer $j\cdot 2^i$
to vertex $v$ in layer $(j+1)\cdot 2^i$ upon reading the sequence $G_i(s)$. 
Now, note that for almost all $h$'s it holds that
$\Prob[U_n \in L_{u,v}^{j)} \wedge h(U_n) \in L_{v,w}^{j+1)}]
 \approx \Prob[U_n \in L_{u,v}^{j)}] \cdot \Prob[U_n \in L_{v,w}^{j+1)}]$
and continue as in the text.

COMMENT: The foregoing may be viewed as a generalized hybrid argument,
where the generalization is that we do not consider a single distribution per 
each level (i.e., each $i$) but rather a collection of such distributions.
Specifically, consider the case of $G_i(z) = G_{i-1}(z)G_{i-1}(h(z))$,
where $h$ is a random hash function.

Note that the initial transformation of $D_k$ to $D'_k$ (presented in top of page 320) can be avoided; that is, one can argue directly about the original distinguisher $D_k$. (See more detailed description, which is meant to replace pages 320-321.) Also, given this analysis, it may be good to start by describing the construction (using Fig 8.3), and identify it with the last hybrid of the analysis.

SUGGESTED ADDITIONAL READING (not mandatory):
[Very recommended. All these notions and constructions are extremely useful to know!]

Sec 8.5 -- three special purpose PRGs.
- Sec 8.5.1 Pairwise-Independence Generators;
- Sec 8.5.2 Small-Bias Generators;
- Sec 8.5.3 Random Walks on Expanders.

These PRGs are used a lot. In fact, you already encountered the 1st (i.e., limited independence PRG) and maybe also the 3rd (i.e., expander walks). The 2nd (i.e., small-bias) will be used in Sec 9.3.2.2 (see bottom of page 392).

SUGGESTED EXERCISES (not to be submitted):

Exer 8.25 on the limitations of PRGs fooling space-bounded distinguishers.
Exer 8.27 multiple samples in the context of space-bounded distinguishers.

Both are intended to promote the understanding of the model; the second may be better for that purpose.

TAKE-HOME MESSAGES FOR THE CURRENT MEETING:

BPL is in SC (polylogarithmic space and polynomial time; Thm 8.23).
A space-fooling PRG with stretch exponential in the space bound and seed length that is quadratic in the space bound (Thm 8.21).
A space-fooling PRG with stretch exponential in the space bound and seed length that is linear in the space bound (Thm 8.22).

WELL-KNOWN OPEN PROBLEMS:

Is BPL=L?
Can Thm 8.21 be improved such that the seed length is sub-quadratic in the space bound (equiv., $s(k)=omega(sqrt(k))$)?
Can Thm 8.21 be improved such that the error bound be exponential in $k$?
Note that the currently stated error bound is not the best possible; it is possible to obtain an error bound of $\exp(-s^c)$ for any $c < 1$.

NOTES FOR the 22ND MEETING

Topic: Probabilistic proof systems - part 1 (interactive proofs)
Estimated difficulty: 1 in the scale of 5.

We now step into a new chapter, one on probabilistic proof systems. The common theme is introducing randomization and even (bounded) error probability into the verification process, with the aim of achieving some features that are impossible (or very unlikely) to be achieved by deterministic systems. Hence, randomness does help wrt proof verification!

A common convention: all complexity measures are stated in terms of the length of the main input (i.e., the "assertion being verified").

We start with interactive proof systems ( ips, and the class IP). The definition highlights the verification process (personalized by a verifier), and views proofs as a dynamic process (rather than a static object). In particular, the prover becomes explicit in the formulation. Soundness (and sometimes also completeness) is given a probabilistic interpretation. Still, verification remains efficient; that is, it is performed in (probabilistic) polynomial-time.

IP extends NP by both randomization and interaction, but interaction is "useless" without randomness. Actually, the soundness error is essential.

Prop: If $S$ has an ips with zero error, then $S\in\NP$.

The proof consists of two steps:
(1) zero-error implies (wlog) deterministic verification (and prover). 
(2) deterministic interaction implies fixed transcript (which can be sent).

In Section 9.1.3, the power of IP is demonstrated in three steps: By showing interactive proof systems for GNI (Sec 9.1.3.1), coNP (pages 359-362), and PSPACE (page 362).

READING ASSIGNMENT

Sec 9.0: a general introduction to PPS.
Sec 9.1.0-9.1.3: The basic material for IP.
Sec 9.1.4: an overview of variants that are good to know (i.e., AM/public-coin, two-sided error, and a constant speed-up of the number of rounds).

SUGGESTED ADDITIONAL READING (not mandatory):

Sec 9.1.5: Discussion regarding the complexity of proving.
Subsequent developments regarding the 3rd direction in Sec 9.1.5.1 are surveyed HERE.

SUGGESTED EXERCISES (not to be submitted):

Exer 9.2: reductions to IP (understanding the definition).
Exer 9.4: a round reduction for "coNP in IP".
Exer 9.5: adapting "coNP in IP" to "#P in IP".
Exer 9.6: on the QBF format that fits our needs (for PSPACE in IP)
Exer 9.7: reducing modulo a random prime -- a trick that is good to know!

TAKE-HOME MESSAGES FOR THE CURRENT MEETING:

IP= PSPACE (Thm 9.4).
The arithmetization of Boolean (CNF) formulae presented in pages 359-360.
An optimal prover strategy can be implemented in polynomial space (Prop 9.4).
General (private-coin) versus public-coin interactive proof systems.
Note: The former can be transformed to the latter, but the known transformation (as well as any other black-box transformation) does not preserve the prover's complexity.

NOTES FOR the 23RD MEETING

Topic: Probabilistic proof systems - part 2 (zero-knowledge proofs)
Estimated difficulty: 1 in the scale of 5.

The material from this point on (if not for a couple of weeks now) is presented in the form of high-level overviews. You may view them as introductions to some research directions/sub-areas. The presentation clarifies the definitions and issues, but typically only sketches the constructions and/or proofs.

The meeting will be devoted to zero-knowledge proofs. The focus is on the discussion of the definition and the presentation of some simple constructions.

READING ASSIGNMENT:
On zero-knowledge (ZK) proofs (Sec 9.2.0--9.2.2, pages 368-377).

Sec 9.2.0: high-level introduction to the topic
Sec 9.2.1: definitional treatment
Sec 9.2.2: outline of two constructions.

SUGGESTED ADDITIONAL READING (not mandatory):

Sec 9.2.3 (p. 378-9): Introduction to "proofs of knowledge".
This is really unrelated to the rest of the course; read it only if you are truly interested in this puzzling notion.

SUGGESTED EXERCISES (not to be submitted):

Exer 9.13 and 9.14: necessary conditions for the non-triviality of ZK. Good for getting used to the definitions.

TAKE-HOME MESSAGES FOR THE CURRENT MEETING:

The simulation paradigm (Sec 9.2.1.1), which offers a way of formulating the notion of gaining nothing (beyond something).
Needless to say, this is pivotal to the foundations of cryptography.
The actual definition of zero-knowledge (Sec 9.2.1.2).
The zero-knowledge proof system for Graph Isomorphism (Sec 9.2.2.1).
The zero-knowledge proof system for NP and IP (Sec 9.2.2.2). Recall that these rely on the existence of one-way functions (Thm 9.11 and 9.12).

NOTES FOR the 24TH MEETING

Topic: Probabilistic proof systems - part 3 (PCP)
Estimated difficulty: 2 in the scale of 5. This presumes that this meeting only covers items (0)+(1). For items (2)+(3) the estimate is 4 in the scale of 5, and it is recommended to invite me.

[NOTE: It may be much better to split this material into two meetings. Do items (0)+(1) in one meeting and items (2)+(3) in another. Indeed, this is what was done in the last incarnations of this group (see also notes for the 25th meeting).]

The overall plan is to present (0) the definition of PCP and the PCP Theorem, and then try to give a taste of its proof (see the next three items). At best, in one meeting, one may hope to cover two of the following three items:

Show that NP is in PCP[poly,O(1)], which is already amazing.
Describe the "proof/PCP composition" paradigm.
Outline Irit Dinur's gap amplification approach.

[NOTE: As stated above, the current recommendation is to split the material to two meetings: Do items (0) and (1) in the current meeting, and leave items (2) and (3) to the next meeting.]

Here is an overview of item (1), which is slightly differently from the presentation in the book. After reducing NP to QE (satisfiability of Quadratic Equations over GF(2)), we consider a proof-oracle that records the sequence of the evaluations of all ($2^{n^2}$) quadratic expressions on some satisfying assignment $x$ to the $n$ variables. The verification will be performed as follows. First, rather than checking $m$ equations, we shall check a random linear combination of these equations. Second, we shall not just fetch the corresponding value from the adequate point in the proof, but rather use self-correction, and this will be done in addition to testing that the proof-oracle is (close to) a valid encoding (which is a Hadamard encoding of the outer-product of $x$ with itself). Hence, testing and self-correcting the Hadamard code will be our first task, and from there we shall proceed to testing and self-correcting the sub-code corresponding of a Hadamard code of an $n^2$-bit long string obtained by an outer-product of an $n$-bit message with itself.

READING ASSIGNMENT:

The PCP model and the PCP theorem (statement only). This includes Sec 9.3.0-9.3.1 and 9.3.2.0 (i.e., pages 380-4).
This corresponds to Part (0) in the foregoing plan. The next item corresponds to Parts (1)-(3) in that plan. Actually, it is better to do only Part (1) now, and leave Parts (2)-(3) to next meeting (the 25th).
The PCP Theorem -- proof outlines: Sections 9.3.2.
1. Sec 9.3.2.1: NP is in PCP[poly,O(1)].
2. Sec 9.3.2.2: Overview of the first proof of the PCP Theorem (with focus on the "proof/PCP composition" paradigm).
3. Sec 9.3.2.3: Overview of the second proof of the PCP Theorem (i.e., Irit's gap amplification approach).
Note that these are overviews. Only Section 9.3.2.1 provides enough hints towards a full reconstruction of the proof. The other two subsections (of Sec 9.3.2) only give a taste. Also, you may skip pages 392-3 if you find it too hard to follow. (Still, I recommend you try them...)

On the other hand, reading Sec 9.3.3 before Sec 9.3.2.3 may be a good idea, mainly for sake of wider perspective on "gap problems". If you don't read Sec 9.3.3 (re "PCP, gaps, and approximation") now, then you'll read it in the next assignment (i.e., for the 25th meeting). It is a relatively easy reading anyhow (and it has only 2 pages).

SUGGESTED ADDITIONAL READING (not mandatory):
Sec 9.3.4 ("more on PCP itself").

HOMEWORK ASSIGNMENT (to be submitted):

Exer 9.17 (e.g., PCP[log,poly] is contained in NP).
Exer 9.24 (note a correction on the book's webpage) and Exer 9.26; they deal with the two directions of the equivalence between PCP and gapCSP.

SUGGESTED EXERCISES (not to be submitted):

Exer 9.15 and 9.16: On the relation between randomness complexity and proof length in the context of PCPs.
Exer 9.20 provides a partial and easier analysis of the linearity test.
Exer 9.21 provides an alternative (and full) analysis of the linearity test.

TAKE-HOME MESSAGES FOR THE CURRENT MEETING:

The PCP Theorem (i.e., NP in PCP[log,O(1)]): Any NP-set has a PCP system with logarithmic randomness and a constant number of queries (Thm 9.16). Furthermore, NP-witness can be transformed to proof-oracles that are accepted by this system with probability 1 (Thm 9.23).
Recall that PCP[log,poly] is in NP (Prop 9.15).
The fact that any NP-set has a PCP system with polynomial randomness and a constant number of queries (Sec 9.3.2.1) is amazingly per se. The proof (of "NP in PCP[poly,O(1)]"), which is based on encoding NP-witnesses by the Hadamard code, is much easier to understand than the PCP Theorem (Thm 9.16). Note that arithmetization is implicit in this proof, since we started by reducing NP to satisfiability of systems of quadratic equations over GF(2).
The Hadamard code can be tested and self-corrected, based on a constant number of queries (Sec 9.3.2.1).
Recall that we saw a list decoding procedure for the Hadamard code in Sec 7.1.3 (Thm 7.8, which revisits Thm 7.7).

NOTES FOR the 25TH MEETING

Topic: Probabilistic proof systems - part 4 (PCP, cont)
Estimated difficulty: 4 in the scale of 5. It is recommended to ask me to attend the meeting.

Indeed, the previous reading assignment was "cut shorter" to include only (i) the PCP model and the PCP theorem (statement only) and (ii) the proof that NP is contained in PCP[poly,O(1)]; that is, Sec 9.3.0-9.3.2.1.

The current reading assignment will consist of the overviews of the two different proofs of the PCP Theorem (i.e., NP in PCP[log,O(1)]), which are briefly reviewed next. (Recall that PCP[log,poly] is in NP.)

Recall that we have seen (in Sec 9.3.2.1) that NP is in PCP[poly,O(1)], which is already amazing. This result will be an ingredient in both alternative proofs of the PCP Theorem.

The 1st proof of the PCP Theorem (Sec 9.3.2.2) is based on two additional ingredients.

The proof composition paradigm, which composes an outer PCP[r',q'] system with an inner PCP[r'',q''] system to obtain a PCP[r,q] system such that $r(n) = r'(n) + r''(q(n))$ and $q(n) = q''(q'(n))$. The composition relies on additional features of the PCPs such as (i) non-adaptiveness of the outer, (ii) small decision circuits of the outer, (iii) the inner being a "proof of proximity", and (iv) "robustness" of the outer.
[Of course, each of these notions needs to be defined and obtained...]
We stress that composition is non-trivial because the query complexity of the resulting scheme is lower than that of the 1st one, and indeed this is the aim of the proof composition paradigm. So we wish to verify a claim about a (residual) input that we cannot even read in full. So we need the inner PCP to be a proof of proximity (which guarantees that this partially read input is close to a valid one) and the outer PCP should be robust (in the sense that typicaly when it rejects it is the case that the sequence of answers is far from satisfying its residual decision predicate).
Showing that NP is in PCP[log,polylog].
See digest on arithmetization in probabilistic proof systems.

Now, composing (2) with itself we get NP in PCP[r,q], where $r(n) = O(\log n) + O(\log \polylog n) = O(\log n)$ and $q(n) = \polylog(\polylog n) = \poly(\log\log n)$. Next, we compose this system with the PCP[poly,O(1)] system and conclude that NP is in PCP[log,O(1)], since we obtain randomness $O(\log n) + \poly(\polyloglog n) = O(\log n)$ and query complexity $O(1)$.

Note that the above proof is obtained by composing two highly non-trivial PCP systems (i.e., PCP[poly,O(1)] and PCP[log,polylog]) using a sophisticated composition theorem (see the add'l features...). In contrast, the 2nd proof of the PCP Theorem (Sec 9.3.2.3) evolves via (logarithmically many) repeated applications of an intuitive "gap amplification" step, which is sketched next.

Let $\PCP_{1,s}[r,q]$ denote a PCP system with perfect completeness
and soundness error $s(n)$, where so far we focused on $s(n)=1/2$.
Clearly, NP is in $\PCP_{1,1-(1/n)}[log,3]$ (a trivial PCP) 
and the basic (gap) amplification step (which amplifies $1-s$) 
is captured by $\PCP_{1,1-\e(n)}[r,q]$ in $\PCP_{1,1-2\e(n)}[r+O(1),q]$,
provided that $\e(n)$ is smaller than some (small) positive constant $c$.
(N.B.: The query complexity $q=O(1)$ is preserved,
whereas the randomness complexity only increased by an additive constant.)
Hence, starting from the trivial PCP and applying logarithmically many steps, 
we obtain NP in $\PCP_{1,1-c}[log,O(1)]$ (which yields the PCP Theorem). 
The point, of course, is proving the amplification step.

In the text, for clarity, we move to a gapCSP problem 
(for two variables over $[O(1)]$),
and will also use "NP in PCP[poly,O(1)]".  
Needless to say, the core of the proof is establishing
the gap amplification step (captured by Lem 9.19).

READING ASSIGNMENT:

The PCP Theorem -- two proof outlines:
1. The original proof (Sec 9.3.2.2) - focus on "proof composition";
2. Irit's proof (Sec 9.3.2.3) - focus on "gradual gap amplification".
These overviews are very high level and only give a taste of what is going on. Also, you may skip pages 392-3 (showing that NP in PCP[log,polylog]) if you find it too hard to follow; still, I recommend you try reading.
Sec 9.3.3 -- PCP, gap problems, and Inapproximability.
(This is a relatively easy reading, and is only 2 pages).

Actually, reading Sec 9.3.3 before Sec 9.3.2.3 may be a good idea, mainly for sake of wider perspective on "gaps problem".

SUGGESTED ADDITIONAL READING (not mandatory):
Sec 9.3.4 ("more on PCP itself").

HOMEWORK ASSIGNMENT (to be submitted, repeated/reminder):

Exer 9.17 (e.g., PCP[log,poly] is contained in NP).
Exer 9.24 and 9.26 (PCP vs gapCSP, two directions)
(Note a correction for Exer 9.24 on the book's webpage.)

TAKE-HOME MESSAGES FOR THE CURRENT MEETING:

Any NP-set has a PCP system with logarithmic randomness and a constant number of queries (Thm 9.16). Furthermore, NP-witness can be transformed to proof-oracles that are accepted by this suystem with probability 1 (Thm 9.23).
The proof composition paradigm composes an outer PCP[r',q'] system with an inner PCP[r'',q''] system to obtain a PCP[r,q] system such that $r(n) = r'(n) + r''(q(n))$ and $q(n) = q''(q'(n))$. See overview on page 388 (Step 3), and more details on pages 389-391.
The proof that NP is in PCP[log,polylog] is based on encoding NP-witnesses by low-degree polynomials over a finite field (equiv., using a Reed-Muller code). Note that the arithmetization of 3SAT that is the first step in this proof is different from the (simpler) arithmetization used in the proof of Thm 9.4 (IP=PSPACE).

NOTES FOR the 26TH MEETING

Topic: Notions of approximation
Estimated difficulty: 1 in the scale of 5.

If you did not read Sec 9.3.3 (PCP and gap problems) yet, please do it now. Note that the terminology of gap problems was used in Irit Dinur's proof of the PCP Theorem (i.e., Sec 9.3.2.3). Here we use it to capture the notion of hardness of approximation; that is, saying that a gap problem is NP-Hard means that approximating the corresponding parameter up-to the corresponding factor is NP-Hard.

READING ASSIGNMENT:

Sec 9.3.3 -- PCP, gap problems, and Inapproximability
Sec 10.1.1 -- Approximate Search (aka approximate optimization)
Sec 10.1.2 -- Approximate Decision (aka Property Testing)

Section 10.1 as a whole is technically light, since most results are only stated but not proved. I suggest to compensate for this by doing the suggested exercises.

SUGGESTED ADDITIONAL READING (not mandatory):
Sec 9.3.4 ("more on PCP itself"). If you did not read it already.

SUGGESTED EXERCISES (not to be submitted):

Exer 9.18: PCP, gapClique, and inapproximability (the FGLSS reduction).
Exer 10.3: A stronger result than in Exer 9.18, alas weaker than best one known...
Exer 10.4: More of the same taste, this one for vertex cover.
Exer 10.6: Something completely different -- inapproximability without PCP.
Exer 10.7: Important technique (expanders and approx-preserving reductions)
Exer 10.8-10.9 (easy): MAJ viewed from the perspective of PT (Sec 10.1.2).
Exer 10.10 (easy): clarifying the definitions of testing graph properties in the adjacency matrix model.

TAKE-HOME MESSAGES FOR THE CURRENT MEETING:

PCP is closely related to approximation: A PCP system yields an approximation problem (i.e., approximating the acceptance probability of a fixed verifier, wrt the best possible proof-oracle, when given the verifier's explicit input).
Approximate optimization problems are captured by gap problems in which inputs are associated with costs, and one is required to distinguish inputs of high cost from inputs of low cost.
Property testing is concerned with sublinear-complexity algorithms that solve approximate decision problems, where approximate decision can be captured by promise problems (of distinguishing inputs in a predetermined set from inputs that are far from the set).

NOTES FOR the 27TH MEETING

Topic: Average-case complexity
Estimated difficulty: 1 in the scale of 5.

Various notions of approximation constitute one type of relaxation of computational problems (intended to reduce the complexity of computational tasks). Another type is average-case analysis of algorithms, where, indeed, the relaxation is in the analysis (not in the problem). As with notions of approximation, the issue of meaningfulness arises, and here it is more acute (see definitional issues below).

I wish to highlight several definitional issues:

Why is "average-case" (wrt uniform distribution (as naively understood)) inadequate (from a conceptual perspective), and what does "typical case" mean.
What are natural distributional problems (i.e., distributional version of NP)?
- Avoiding the extremes (of all distributions or a single distribution).
- Simple (P-computable) distributions vs P-sampleable ones.
A notion of reduction that preserves typical ease; the role of domination (a notion to be defined).

If we want to see "reduction that preserves typical ease", then we better look for new ones...
Indeed, we shall present such reductions when proving the following results:

THM 1 (easy): completeness for (NP,P-sampleable).
THM 2 (main material -- Sec 10.2.1.2): completeness for (NP,simple).
THM 3 (advanced material -- Sec 10.2.2.2): reducing (NP,P-sample) to (NP,simple).

Combining THM 2 and 3, we have a problem in a restricted class ("simple") that is complete for the larger class ("sampleable").

READING ASSIGNMENT (last one):

Sec 10.2.0 -- motivation, please don't skip it!
Sec 10.2.1 -- the basic theory. Pay attention to the first two paragraphs of Sec 10.2.1.2 (bot p. 435) and the two "Reflection" paragraphs on pages 439-440.
Page 451 (reflections and relation to OWF).

SUGGESTED ADDITIONAL READING (not mandatory):

Sec 10.2.2.1 -- search vs decision
Sec 10.2.2.2 -- P-sampleable vs simple, skipping the proof of Thm 10.26.
Advanced (be warned!): The proof of Thm 10.26 (pages 447-450).

SUGGESTED EXERCISES (not to be submitted):

Exer 10.14--10.17: relations between some definitions (re Sec 10.2.1).
Exer 10.25--10.28: relations between some definitions (re Sec 10.2.2).

TAKE-HOME MESSAGE FOR THE CURRENT MEETING:

A theory of average-case complexity should not be confined to the uniform distribution. On the other hand, it should identify a natural class of distributions and provide results about NP-problems coupled with such distributions.

Last revised: August 2021.

Back to the Computation Complexity (book).